Výsledky vyhledávání - "Virpioja, Sami"

Report

Uncertainty-Aware Natural Language Inference with Stochastic Weight Averaging

Autor: Talman, Aarne, Celikkanat, Hande, Virpioja, Sami, Heinonen, Markus, Tiedemann, Jörg

This paper introduces Bayesian uncertainty modeling using Stochastic Weight Averaging-Gaussian (SWAG) in Natural Language Understanding (NLU) tasks. We apply the approach to standard tasks in natural language inference (NLI) and demonstrate the effec

Externí odkaz: http://arxiv.org/abs/2304.04726

Zobrazit plný text záznamu

Report

Democratizing Neural Machine Translation with OPUS-MT

Autor: Tiedemann, Jörg, Aulamo, Mikko, Bakshandaeva, Daria, Boggia, Michele, Grönroos, Stig-Arne, Nieminen, Tommi, Raganato, Alessandro, Scherrer, Yves, Vazquez, Raul, Virpioja, Sami

This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission o

Externí odkaz: http://arxiv.org/abs/2212.01936

Zobrazit plný text záznamu

Report

FinChat: Corpus and evaluation setup for Finnish chat conversations on everyday topics

Autor: Leino, Katri, Leinonen, Juho, Singh, Mittul, Virpioja, Sami, Kurimo, Mikko

Creating open-domain chatbots requires large amounts of conversational data and related benchmark tasks to evaluate them. Standardized evaluation tasks are crucial for creating automatic evaluation metrics for model development; otherwise, comparing

Externí odkaz: http://arxiv.org/abs/2008.08315

Zobrazit plný text záznamu

Report

Effects of Language Relatedness for Cross-lingual Transfer Learning in Character-Based Language Models

Autor: Singh, Mittul, Smit, Peter, Virpioja, Sami, Kurimo, Mikko

Character-based Neural Network Language Models (NNLM) have the advantage of smaller vocabulary and thus faster training times in comparison to NNLMs based on multi-character units. However, in low-resource scenarios, both the character and multi-char

Externí odkaz: http://arxiv.org/abs/2007.11648

Zobrazit plný text záznamu

Report

Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

Autor: Singh, Mittul, Virpioja, Sami, Smit, Peter, Kurimo, Mikko

In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, bu

Externí odkaz: http://arxiv.org/abs/2005.13827

Zobrazit plný text záznamu

Report

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Autor: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko

There are several approaches for improving neural machine translation for low-resource languages: Monolingual data can be exploited via pretraining or data augmentation; Parallel corpora on related language pairs can be used via parameter sharing or

Externí odkaz: http://arxiv.org/abs/2004.04002

Zobrazit plný text záznamu

Report

Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

Autor: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko

Data-driven segmentation of words into subword units has been used in various natural language processing applications such as automatic speech recognition and statistical machine translation for almost 20 years. Recently it has became more widely ad

Externí odkaz: http://arxiv.org/abs/2003.03131

Zobrazit plný text záznamu

Report

The University of Helsinki submissions to the WMT19 news translation task

Autor: Talman, Aarne, Sulubacak, Umut, Vázquez, Raúl, Scherrer, Yves, Virpioja, Sami, Raganato, Alessandro, Hurskainen, Arvi, Tiedemann, Jörg

In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English. This year, we focused first on cleaning and filtering the t

Externí odkaz: http://arxiv.org/abs/1906.04040

Zobrazit plný text záznamu

Report

Cognate-aware morphological segmentation for multilingual neural translation

Autor: Grönroos, Stig-Arne, Virpioja, Sami, Kurimo, Mikko

This article describes the Aalto University entry to the WMT18 News Translation Shared Task. We participate in the multilingual subtrack with a system trained under the constrained condition to translate from English to both Finnish and Estonian. The

Externí odkaz: http://arxiv.org/abs/1808.10791

Zobrazit plný text záznamu

Report

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

Autor: Enarvi, Seppo, Smit, Peter, Virpioja, Sami, Kurimo, Mikko

Publikováno v: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 11, pp. 2085-2097, November 2017

Today, the vocabulary size for language models in large vocabulary speech recognition is typically several hundreds of thousands of words. While this is already sufficient in some applications, the out-of-vocabulary words are still limiting the usabi

Externí odkaz: http://arxiv.org/abs/1707.04227

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání