Zobrazeno 1 - 10
of 319
pro vyhledávání: '"Hasegawa‐Johnson, Mark"'
The COVID-19 pandemic has underscored the need for low-cost, scalable approaches to measuring contactless vital signs, either during initial triage at a healthcare facility or virtual telemedicine visits. Remote photoplethysmography (rPPG) can accura
Externí odkaz:
http://arxiv.org/abs/2410.15851
Publikováno v:
Proceedings of Interspeech 2024
This paper enhances dysarthric and dysphonic speech recognition by fine-tuning pretrained automatic speech recognition (ASR) models on the 2023-10-05 data package of the Speech Accessibility Project (SAP), which contains the speech of 253 people with
Externí odkaz:
http://arxiv.org/abs/2409.19818
Autor:
Wu, Junkai, Fan, Xulin, Lu, Bo-Ru, Jiang, Xilin, Mesgarani, Nima, Hasegawa-Johnson, Mark, Ostendorf, Mari
In recent years, we have observed a rapid advancement in speech language models (SpeechLLMs), catching up with humans' listening and reasoning abilities. SpeechLLMs have demonstrated impressive spoken dialog question-answering (SQA) performance in be
Externí odkaz:
http://arxiv.org/abs/2409.04927
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which e
Externí odkaz:
http://arxiv.org/abs/2408.05769
Autor:
Yoon, Eunseop, Yoon, Hee Suk, Eom, SooHwan, Han, Gunsoo, Nam, Daniel Wontae, Jo, Daejin, On, Kyoung-Woon, Hasegawa-Johnson, Mark A., Kim, Sungwoong, Yoo, Chang D.
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between
Externí odkaz:
http://arxiv.org/abs/2407.16574
Autor:
Khan, Mohammad Nur Hossain, Li, Jialu, McElwain, Nancy L., Hasegawa-Johnson, Mark, Islam, Bashima
Certain environmental noises have been associated with negative developmental outcomes for infants and young children. Though classifying or tagging sound events in a domestic environment is an active research area, previous studies focused on data c
Externí odkaz:
http://arxiv.org/abs/2406.17190
Autor:
Ni, Junrui, Wang, Liming, Zhang, Yang, Qian, Kaizhi, Gao, Heting, Hasegawa-Johnson, Mark, Yoo, Chang D.
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text dat
Externí odkaz:
http://arxiv.org/abs/2406.08380
Autor:
Yoon, Hee Suk, Yoon, Eunseop, Tee, Joshua Tian Jin, Hasegawa-Johnson, Mark, Li, Yingzhen, Yoo, Chang D.
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as C
Externí odkaz:
http://arxiv.org/abs/2403.14119
Autor:
Eom, SooHwan, Yoon, Eunseop, Yoon, Hee Suk, Kim, Chanwoo, Hasegawa-Johnson, Mark, Yoo, Chang D.
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool t
Externí odkaz:
http://arxiv.org/abs/2403.11578
To understand why self-supervised learning (SSL) models have empirically achieved strong performances on several speech-processing downstream tasks, numerous studies have focused on analyzing the encoded information of the SSL layer representations i
Externí odkaz:
http://arxiv.org/abs/2402.06888