Zobrazeno 1 - 10
of 56
pro vyhledávání: '"Matt Post"'
Publikováno v:
Interspeech 2022.
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with ca
Publikováno v:
Transactions of the Association for Computational Linguistics, Vol 8, Pp 49-63 (2020)
Data privacy is an important issue for “machine learning as a service” providers. We focus on the problem of membership inference attacks: Given a data sample and black-box access to a model’s API, determine whether the sample existed in the mo
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::edb07fdc30bd7eeff4c59e70a72e088b
http://arxiv.org/abs/2109.05611
http://arxiv.org/abs/2109.05611
Autor:
Matthew Wiesner, Jacob Bremerman, Marco Turchi, Matt Post, Elizabeth Salesky, Matteo Negri, Roldano Cattoni, Douglas W. Oard
Publikováno v:
Interspeech 2021.
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We
Publikováno v:
AAAI
Scopus-Elsevier
Web of Science
Scopus-Elsevier
Web of Science
We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of
Autor:
Matt Post, Rachel Wicks
Publikováno v:
ACL/IJCNLP (1)
The sentence is a fundamental unit of text processing. Yet sentences in the wild are commonly encountered not in isolation, but unsegmented within larger paragraphs and documents. Therefore, the first step in many NLP pipelines is sentence segmentati
Publikováno v:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an 'open vocabulary.' This approach relies on consistent and correct underlying unicode sequences, and makes models susceptible to degra
Publikováno v:
Bawden, R, Zhang, B, Yankovskaya, L, Tättar, A & Post, M 2020, A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing . in Findings of the Association for Computational Linguistics: EMNLP 2020 . pp. 918-932, The 2020 Conference on Empirical Methods in Natural Language Processing, Virtual conference, 16/11/20 . < https://www.aclweb.org/anthology/2020.findings-emnlp.82 >
EMNLP (Findings)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, Punta Cana (online), Dominican Republic
Findings of the Association for Computational Linguistics: EMNLP 2020
HAL
EMNLP (Findings)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, Punta Cana (online), Dominican Republic
Findings of the Association for Computational Linguistics: EMNLP 2020
HAL
We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference. Using modern neural paraphrasing techniques, we study whether automatically generating additional diverse references can provide better covera
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87173cd05b698e7ae5eaf8d093ac3ca9
https://www.pure.ed.ac.uk/ws/files/174905090/ParBleu_Syntactic_diversity_26.pdf
https://www.pure.ed.ac.uk/ws/files/174905090/ParBleu_Syntactic_diversity_26.pdf
Publikováno v:
EMNLP (1)
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel M
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dbbefda967a79a1a4329f100979c4725
http://arxiv.org/abs/2004.14524
http://arxiv.org/abs/2004.14524
Autor:
Brian Thompson, Matt Post
Publikováno v:
EMNLP (1)
We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We propose training the paraphraser as a multilingual NMT system, treating par
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d16fa5b49e4c9a6bf63544ffb7fb3f6b
http://arxiv.org/abs/2004.14564
http://arxiv.org/abs/2004.14564