Zobrazeno 1 - 10
of 176
pro vyhledávání: '"Goldwater, Sharon"'
Self-supervised speech representations can hugely benefit downstream speech technologies, yet the properties that make them useful are still poorly understood. Two candidate properties related to the geometry of the representation space have been hyp
Externí odkaz:
http://arxiv.org/abs/2406.09200
On annotating multi-dialect Arabic datasets, it is common to randomly assign the samples across a pool of native Arabic speakers. Recent analyses recommended routing dialectal samples to native speakers of their respective dialects to build higher-qu
Externí odkaz:
http://arxiv.org/abs/2405.11282
Speech perception involves storing and integrating sequentially presented items. Recent work in cognitive neuroscience has identified temporal and contextual characteristics in humans' neural encoding of speech that may facilitate this temporal proce
Externí odkaz:
http://arxiv.org/abs/2405.08237
Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this variation, previous
Externí odkaz:
http://arxiv.org/abs/2310.13747
Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a
Externí odkaz:
http://arxiv.org/abs/2306.02153
Self-supervised speech representations are known to encode both speaker and phonetic information, but how they are distributed in the high-dimensional space remains largely unexplored. We hypothesize that they are encoded in orthogonal subspaces, a p
Externí odkaz:
http://arxiv.org/abs/2305.12464
Parsing spoken dialogue presents challenges that parsing text does not, including a lack of clear sentence boundaries. We know from previous work that prosody helps in parsing single sentences (Tran et al. 2018), but we want to show the effect of pro
Externí odkaz:
http://arxiv.org/abs/2302.12165
Given the strong results of self-supervised models on various tasks, there have been surprisingly few studies exploring self-supervised representations for acoustic word embeddings (AWE), fixed-dimensional vectors representing variable-length spoken
Externí odkaz:
http://arxiv.org/abs/2210.16043
Autor:
Szubert, Ida, Abend, Omri, Schneider, Nathan, Gibbon, Samuel, Mahon, Louis, Goldwater, Sharon, Steedman, Mark
This paper proposes a methodology for constructing such corpora of child directed speech (CDS) paired with sentential logical forms, and uses this method to create two such corpora, in English and Hebrew. The approach enforces a cross-linguistically
Externí odkaz:
http://arxiv.org/abs/2109.10952
Word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech translation or speech recognition, attention
Externí odkaz:
http://arxiv.org/abs/2109.10107