Zobrazeno 1 - 10
of 135
pro vyhledávání: '"Alishahi, Afra"'
Transformer-based language models have shown an excellent ability to effectively capture and utilize contextual information. Although various analysis techniques have been used to quantify and trace the contribution of single contextual cues to a tar
Externí odkaz:
http://arxiv.org/abs/2410.03447
Neural speech models build deeply entangled internal representations, which capture a variety of features (e.g., fundamental frequency, loudness, syntactic category, or semantic content of a word) in a distributed encoding. This complexity makes it d
Externí odkaz:
http://arxiv.org/abs/2410.03037
Human listeners effortlessly compensate for phonological changes during speech perception, often unconsciously inferring the intended sounds. For example, listeners infer the underlying /n/ when hearing an utterance such as "clea[m] pan", where [m] a
Externí odkaz:
http://arxiv.org/abs/2406.15265
Interpretability research has shown that self-supervised Spoken Language Models (SLMs) encode a wide variety of features in human speech from the acoustic, phonetic, phonological, syntactic and semantic levels, to speaker characteristics. The bulk of
Externí odkaz:
http://arxiv.org/abs/2403.16865
Transformers have become a key architecture in speech processing, but our understanding of how they build up representations of acoustic and linguistic structure is limited. In this study, we address this gap by investigating how measures of 'context
Externí odkaz:
http://arxiv.org/abs/2310.09925
Publikováno v:
Proceedings of Interspeech 2023
Understanding which information is encoded in deep models of spoken and written language has been the focus of much research in recent years, as it is crucial for debugging and improving these architectures. Most previous work has focused on probing
Externí odkaz:
http://arxiv.org/abs/2305.18957
Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. But despite their ease of interpretation, these weights are not faithful to the mode
Externí odkaz:
http://arxiv.org/abs/2301.12971
Recent computational models of the acquisition of spoken language via grounding in perception exploit associations between the spoken and visual modalities and learn to represent speech and visual data in a joint vector space. A major unresolved issu
Externí odkaz:
http://arxiv.org/abs/2202.12917
Autor:
Alishahi, Afra, Chrupała, Grzegorz, Cristia, Alejandrina, Dupoux, Emmanuel, Higy, Bertrand, Lavechin, Marvin, Räsänen, Okko, Yu, Chen
We present the visually-grounded language modelling track that was introduced in the Zero-Resource Speech challenge, 2021 edition, 2nd round. We motivate the new track and discuss participation rules in detail. We also present the two baseline system
Externí odkaz:
http://arxiv.org/abs/2107.06546