Výsledky vyhledávání

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Autor: Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur

Publikováno v: Interspeech 2022.

Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with ca

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7891f792a9a4d32eb1b07a91bf24d353
https://doi.org/10.21437/interspeech.2022-10953

Zobrazit plný text záznamu

Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

Autor: Sorami Hisamoto, Matt Post, Kevin Duh

Publikováno v: Transactions of the Association for Computational Linguistics, Vol 8, Pp 49-63 (2020)

Data privacy is an important issue for “machine learning as a service” providers. We focus on the problem of membership inference attacks: Given a data sample and black-box access to a model’s API, determine whether the sample existed in the mo

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7ae548db9638101e571b7dfc3fcbdc1f
https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00299

Zobrazit plný text záznamu

Levenshtein Training for Word-level Quality Estimation

Autor: Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation. A Levenshtein Transformer is a natural fit for this task: trained to perform decoding in an iterative manner, a Levenshtein Transformer

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::edb07fdc30bd7eeff4c59e70a72e088b
http://arxiv.org/abs/2109.05611

Zobrazit plný text záznamu

The Multilingual TEDx Corpus for Speech Recognition and Translation

Autor: Matthew Wiesner, Jacob Bremerman, Marco Turchi, Matt Post, Elizabeth Salesky, Matteo Negri, Roldano Cattoni, Douglas W. Oard

Publikováno v: Interspeech 2021.

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages. The corpus is a collection of audio recordings from TEDx talks in 8 source languages. We

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5a37da065dac43c3ac9920fc62a7036f
https://doi.org/10.21437/interspeech.2021-11

Zobrazit plný text záznamu

PARABANK: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-Constrained Neural Machine Translation

Autor: Benjamin Van Durme, J. Edward Hu, Rachel Rudinger, Matt Post

Publikováno v: AAAI
Scopus-Elsevier
Web of Science

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a0daa560d3b2b6c3521c7a45498a10f3
https://doi.org/10.1609/aaai.v33i01.33016521

Zobrazit plný text záznamu

A unified approach to sentence segmentation of punctuated text in many languages

Autor: Matt Post, Rachel Wicks

Publikováno v: ACL/IJCNLP (1)

The sentence is a fundamental unit of text processing. Yet sentences in the wild are commonly encountered not in isolation, but unsegmented within larger paragraphs and documents. Therefore, the first step in many NLP pipelines is sentence segmentati

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::8ea17353174e842aedf4c4ebd5511995
https://doi.org/10.18653/v1/2021.acl-long.309

Zobrazit plný text záznamu

Robust Open-Vocabulary Translation from Visual Text Representations

Autor: Elizabeth Salesky, David Etter, Matt Post

Publikováno v: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.

Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an 'open vocabulary.' This approach relies on consistent and correct underlying unicode sequences, and makes models susceptible to degra

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9a973b895f436cefc02ecf587318cf6
https://doi.org/10.18653/v1/2021.emnlp-main.576

Zobrazit plný text záznamu

A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing

Autor: Matt Post, Rachel Bawden, Biao Zhang, Lisa Yankovskaya, Andre Tättar

Publikováno v: Bawden, R, Zhang, B, Yankovskaya, L, Tättar, A & Post, M 2020, A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing . in Findings of the Association for Computational Linguistics: EMNLP 2020 . pp. 918-932, The 2020 Conference on Empirical Methods in Natural Language Processing, Virtual conference, 16/11/20 . < https://www.aclweb.org/anthology/2020.findings-emnlp.82 >
EMNLP (Findings)
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings
2020 Conference on Empirical Methods in Natural Language Processing: Findings, 2020, Punta Cana (online), Dominican Republic
Findings of the Association for Computational Linguistics: EMNLP 2020
HAL

We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference. Using modern neural paraphrasing techniques, we study whether automatically generating additional diverse references can provide better covera

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::87173cd05b698e7ae5eaf8d093ac3ca9
https://www.pure.ed.ac.uk/ws/files/174905090/ParBleu_Syntactic_diversity_26.pdf

Zobrazit plný text záznamu

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

Autor: Brian Thompson, Huda Khayrallah, Matt Post, Philipp Koehn

Publikováno v: EMNLP (1)

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel M

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dbbefda967a79a1a4329f100979c4725
http://arxiv.org/abs/2004.14524

Zobrazit plný text záznamu

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

Autor: Brian Thompson, Matt Post

Publikováno v: EMNLP (1)

We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference. We propose training the paraphraser as a multilingual NMT system, treating par

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d16fa5b49e4c9a6bf63544ffb7fb3f6b
http://arxiv.org/abs/2004.14564

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání