Zobrazeno 1 - 10
of 1 383
pro vyhledávání: '"Martins, André A."'
Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-
Externí odkaz:
http://arxiv.org/abs/2411.08590
Autor:
Ramos, Miguel Moura, Almeida, Tomás, Vareta, Daniel, Azevedo, Filipe, Agrawal, Sweta, Fernandes, Patrick, Martins, André F. T.
Reinforcement learning (RL) has been proven to be an effective and robust method for training neural machine translation systems, especially when paired with powerful reward models that accurately assess translation quality. However, most research ha
Externí odkaz:
http://arxiv.org/abs/2411.05986
Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of t
Externí odkaz:
http://arxiv.org/abs/2410.16246
The automatic assessment of translation quality has recently become crucial across several stages of the translation pipeline, from data curation to training and decoding. Although quality estimation (QE) metrics have been optimized to align with hum
Externí odkaz:
http://arxiv.org/abs/2410.10995
Autor:
Agrawal, Sweta, de Souza, José G. C., Rei, Ricardo, Farinhas, António, Faria, Gonçalo, Fernandes, Patrick, Guerreiro, Nuno M, Martins, Andre
Alignment with human preferences is an important step in developing accurate and safe large language models. This is no exception in machine translation (MT), where better handling of language nuances and context-specific variations leads to improved
Externí odkaz:
http://arxiv.org/abs/2410.07779
Autor:
Martins, Pedro Henrique, Fernandes, Patrick, Alves, João, Guerreiro, Nuno M., Rei, Ricardo, Alves, Duarte M., Pombal, José, Farajian, Amin, Faysse, Manuel, Klimaszewski, Mateusz, Colombo, Pierre, Haddow, Barry, de Souza, José G. C., Birch, Alexandra, Martins, André F. T.
The quality of open-weight LLMs has seen significant improvement, yet they remain predominantly focused on English. In this paper, we introduce the EuroLLM project, aimed at developing a suite of open-weight multilingual LLMs capable of understanding
Externí odkaz:
http://arxiv.org/abs/2409.16235
To ensure large language models (LLMs) are used safely, one must reduce their propensity to hallucinate or to generate unacceptable answers. A simple and often used strategy is to first let the LLM generate multiple hypotheses and then employ a reran
Externí odkaz:
http://arxiv.org/abs/2409.07131
Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Dec
Externí odkaz:
http://arxiv.org/abs/2408.13745
Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models, which enjoy
Externí odkaz:
http://arxiv.org/abs/2407.05489
Recent studies have highlighted the potential of exploiting parallel corpora to enhance multilingual large language models, improving performance in both bilingual tasks, e.g., machine translation, and general-purpose tasks, e.g., text classification
Externí odkaz:
http://arxiv.org/abs/2407.00436