Výsledky vyhledávání

Report

Enhancing disease detection in radiology reports through fine-tuning lightweight LLM on weak labels

Autor: Wei, Yishu, Wang, Xindi, Ong, Hanley, Zhou, Yiliang, Flanders, Adam, Shih, George, Peng, Yifan

Despite significant progress in applying large language models (LLMs) to the medical domain, several limitations still prevent them from practical applications. Among these are the constraints on model size and the lack of cohort-specific labeled dat

Externí odkaz: http://arxiv.org/abs/2409.16563

Zobrazit plný text záznamu

Report

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts

Autor: Wu, Yihan, Peng, Yifan, Lu, Yichen, Chang, Xuankai, Song, Ruihua, Watanabe, Shinji

Visual signals can enhance audiovisual speech recognition accuracy by providing additional contextual information. Given the complexity of visual signals, an audiovisual speech recognition model requires robust generalization capabilities across dive

Externí odkaz: http://arxiv.org/abs/2409.12370

Zobrazit plný text záznamu

Report

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Autor: Someki, Masao, Choi, Kwanghee, Arora, Siddhant, Chen, William, Cornell, Samuele, Han, Jionghao, Peng, Yifan, Shi, Jiatong, Srivastav, Vaibhav, Watanabe, Shinji

We introduce ESPnet-EZ, an extension of the open-source speech processing toolkit ESPnet, aimed at quick and easy development of speech models. ESPnet-EZ focuses on two major aspects: (i) easy fine-tuning and inference of existing ESPnet models on va

Externí odkaz: http://arxiv.org/abs/2409.09506

Zobrazit plný text záznamu

Report

In situ fully vectorial tomography and pupil function retrieval of tightly focused fields

Autor: Liu, Xin, Tu, Shijie, Hu, Yiwen, Peng, Yifan, Han, Yubing, Kuang, Cuifang, Liu, Xu, Hao, Xiang

Tightly focused optical fields are essential in nano-optics, but their applications have been limited by the challenges of accurate yet efficient characterization. In this article, we develop an in situ method for reconstructing the fully vectorial i

Externí odkaz: http://arxiv.org/abs/2408.14852

Zobrazit plný text záznamu

Report

Closing the gap between open-source and commercial large language models for medical evidence summarization

Autor: Zhang, Gongbo, Jin, Qiao, Zhou, Yiliang, Wang, Song, Idnay, Betina R., Luo, Yiming, Park, Elizabeth, Nestor, Jordan G., Spotnitz, Matthew E., Soroush, Ali, Campion, Thomas, Lu, Zhiyong, Weng, Chunhua, Peng, Yifan

Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor de

Externí odkaz: http://arxiv.org/abs/2408.00588

Zobrazit plný text záznamu

Report

SDoH-GPT: Using Large Language Models to Extract Social Determinants of Health (SDoH)

Autor: Consoli, Bernardo, Wu, Xizhi, Wang, Song, Zhao, Xinyu, Wang, Yanshan, Rousseau, Justin, Hartvigsen, Tom, Shen, Li, Wu, Huanmei, Peng, Yifan, Long, Qi, Chen, Tianlong, Ding, Ying

Extracting social determinants of health (SDoH) from unstructured medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. In this study we introduced SDoH-GPT, a sim

Externí odkaz: http://arxiv.org/abs/2407.17126

Zobrazit plný text záznamu

Report

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels

Autor: Prabhu, Darshan, Peng, Yifan, Jyothi, Preethi, Watanabe, Shinji

Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to vanilla Transfo

Externí odkaz: http://arxiv.org/abs/2407.03718

Zobrazit plný text záznamu

Report

Towards Robust Speech Representation Learning for Thousands of Languages

Autor: Chen, William, Zhang, Wangyou, Peng, Yifan, Li, Xinjian, Tian, Jinchuan, Shi, Jiatong, Chang, Xuankai, Maiti, Soumi, Livescu, Karen, Watanabe, Shinji

Self-supervised learning (SSL) has helped extend speech technologies to more languages by reducing the need for labeled data. However, models are still far from supporting the world's 7000+ languages. We propose XEUS, a Cross-lingual Encoder for Univ

Externí odkaz: http://arxiv.org/abs/2407.00837

Zobrazit plný text záznamu

Report

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Autor: Shakeel, Muhammad, Sudo, Yui, Peng, Yifan, Watanabe, Shinji

Contextualized end-to-end automatic speech recognition has been an active research area, with recent efforts focusing on the implicit learning of contextual phrases based on the final loss objective. However, these approaches ignore the useful contex

Externí odkaz: http://arxiv.org/abs/2406.16120

Zobrazit plný text záznamu

Report

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models

Autor: Tian, Jinchuan, Peng, Yifan, Chen, William, Choi, Kwanghee, Livescu, Karen, Watanabe, Shinji

The Open Whisper-style Speech Model (OWSM) series was introduced to achieve full transparency in building advanced speech-to-text (S2T) foundation models. To this end, OWSM models are trained on 25 public speech datasets, which are heterogeneous in m

Externí odkaz: http://arxiv.org/abs/2406.09282

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání