Zobrazeno 1 - 10
of 754
pro vyhledávání: '"Williamson, P. S."'
Self-supervised learning (SSL) has grown in interest within the speech processing community, since it produces representations that are useful for many downstream tasks. SSL uses global and contextual methods to produce robust representations, where
Externí odkaz:
http://arxiv.org/abs/2411.04379
Carrying conversations in multi-sound environments is one of the more challenging tasks, since the sounds overlap across time and frequency making it difficult to understand a single sound source. One proposed approach to help isolate an attended spe
Externí odkaz:
http://arxiv.org/abs/2410.18395
Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignmen
Externí odkaz:
http://arxiv.org/abs/2410.13182
Autor:
Kibria, Imran E, Williamson, Donald S.
Speech quality is best evaluated by human feedback using mean opinion scores (MOS). However, variance in ratings between listeners can introduce noise in the true quality label of an utterance. Currently, deep learning networks including convolutiona
Externí odkaz:
http://arxiv.org/abs/2410.12675
Integrating Electronic Health Records (EHR) and the application of machine learning present opportunities for enhancing the accuracy and accessibility of data-driven diabetes prediction. In particular, developing data-driven machine learning models c
Externí odkaz:
http://arxiv.org/abs/2408.12029
Autor:
Liu, Yuchen, Ong, Natasha, Peng, Kaiyan, Xiong, Bo, Wang, Qifan, Hou, Rui, Khabsa, Madian, Yang, Kaiyue, Liu, David, Williamson, Donald S., Yu, Hanchao
We present Multiscale Multiview Vision Transformers (MMViT), which introduces multiscale feature maps and multiview encodings to transformer models. Our model encodes different views of the input signal and builds several channel-resolution feature s
Externí odkaz:
http://arxiv.org/abs/2305.00104
Perceptually-inspired objective functions such as the perceptual evaluation of speech quality (PESQ), signal-to-distortion ratio (SDR), and short-time objective intelligibility (STOI), have recently been used to optimize performance of deep-learning-
Externí odkaz:
http://arxiv.org/abs/2303.13685
Dereverberation is often performed directly on the reverberant audio signal, without knowledge of the acoustic environment. Reverberation time, T60, however, is an essential acoustic factor that reflects how reverberation may impact a signal. In this
Externí odkaz:
http://arxiv.org/abs/2302.04932
Autor:
Yi, Gaoxiong, Xiao, Wei, Xiao, Yiming, Naderi, Babak, Möller, Sebastian, Wardah, Wafaa, Mittag, Gabriel, Cutler, Ross, Zhang, Zhuohuang, Williamson, Donald S., Chen, Fei, Yang, Fuzheng, Shang, Shidong
With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background n
Externí odkaz:
http://arxiv.org/abs/2203.16032
Climate change is predicted to lead to major changes in terrestrial ecosystems. However, significant differences in climate model projections for given scenarios of greenhouse gas emissions, continue to hinder detailed assessment. Here we show, using
Externí odkaz:
http://arxiv.org/abs/2203.13831