Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Ciprian Chelba"'
Autor:
Rami Botros, Rohit Prabhavalkar, Johan Schalkwyk, Ciprian Chelba, Tara N. Sainath, Françoise Beaufays
In end-to-end (E2E) speech recognition models, a representational tight-coupling inevitably emerges between the encoder and the decoder. We build upon recent work that has begun to explore building encoders with modular encoded representations, such
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8a9eb14f07d3c1400c300713bbe63244
Publikováno v:
ACL (1)
Noise and domain are important aspects of data quality for neural machine translation. Existing research focus separately on domain-data selection, clean-data selection, or their static combination, leaving the dynamic interaction across them not exp
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::213ae8dd704c7dfaee19b092566bb6e6
Publikováno v:
WMT (1)
Recent work in Neural Machine Translation (NMT) has shown significant quality gains from noised-beam decoding during back-translation, a method to generate synthetic parallel data. We show that the main role of such synthetic noise is not to diversif
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::17cb9e745e5422c553c84deb26579fde
Publikováno v:
WMT
Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the
Publikováno v:
INTERSPEECH
We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::71185e1026c495bd2386ba3f8a2ce86e
https://lirias.kuleuven.be/handle/123456789/543949
https://lirias.kuleuven.be/handle/123456789/543949
Autor:
Ciprian Chelba, Noam Shazeer
Publikováno v:
ASRU
The paper investigates the impact on query language modeling when using skip-grams within query as well as across queries in a given search session, in conjunction with the geo-annotation available for the query stream data. As modeling tool we use t
Publikováno v:
IEEE Signal Processing Magazine. 25:39-49
Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval ha
Publikováno v:
Computer Speech & Language. 21:458-478
The paper presents the Position Specific Posterior Lattice (PSPL), a novel lossy representation of automatic speech recognition lattices that naturally lends itself to efficient indexing and subsequent relevance ranking of spoken documents. This tech
Publikováno v:
INTERSPEECH
We investigate the benefit of augmenting with geo-location information the language model used in speech recognition for voice-search. We observe reductions in perplexity of up to 15% relative on test sets obtained from both typed query data, as well