Výsledky vyhledávání - "Alami, Réda"

Report

Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory

Autor: Firdoussi, Aymane El, Seddik, Mohamed El Amine, Hayou, Soufiane, Alami, Reda, Alzubaidi, Ahmed, Hacid, Hakim

Synthetic data has gained attention for training large language models, but poor-quality data can harm performance (see, e.g., Shumailov et al. (2023); Seddik et al. (2024)). A potential solution is data pruning, which retains only high-quality data

Externí odkaz: http://arxiv.org/abs/2410.08942

Zobrazit plný text záznamu

Report

Alignment with Preference Optimization Is All You Need for LLM Safety

Autor: Alami, Reda, Almansoori, Ali Khalifa, Alzubaidi, Ahmed, Seddik, Mohamed El Amine, Farooq, Mugariya, Hacid, Hakim

We demonstrate that preference optimization methods can effectively enhance LLM safety. Applying various alignment techniques to the Falcon 11B model using safety datasets, we achieve a significant boost in global safety score (from $57.64\%$ to $99.

Externí odkaz: http://arxiv.org/abs/2409.07772

Zobrazit plný text záznamu

Report

Falcon2-11B Technical Report

Autor: Malartic, Quentin, Chowdhury, Nilabhra Roy, Cojocaru, Ruxandra, Farooq, Mugariya, Campesan, Giulia, Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Singh, Ankit, Velikanov, Maksim, Boussaha, Basma El Amel, Al-Yafeai, Mohammed, Alobeidli, Hamza, Qadi, Leen Al, Seddik, Mohamed El Amine, Fedyanin, Kirill, Alami, Reda, Hacid, Hakim

We introduce Falcon2-11B, a foundation model trained on over five trillion tokens, and its multimodal counterpart, Falcon2-11B-vlm, which is a vision-to-text model. We report our findings during the training of the Falcon2-11B which follows a multi-s

Externí odkaz: http://arxiv.org/abs/2407.14885

Zobrazit plný text záznamu

Report

Investigating Regularization of Self-Play Language Models

Autor: Alami, Reda, Abubaker, Abdalgader, Achab, Mastane, Seddik, Mohamed El Amine, Lahlou, Salem

This paper explores the effects of various forms of regularization in the context of language model alignment via self-play. While both reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) require to collect cost

Externí odkaz: http://arxiv.org/abs/2404.04291

Zobrazit plný text záznamu

Report

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

Autor: Mangold, Paul, Samsonov, Sergey, Labbi, Safwan, Levin, Ilya, Alami, Reda, Naumov, Alexey, Moulines, Eric

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the communication compl

Externí odkaz: http://arxiv.org/abs/2402.04114

Zobrazit plný text záznamu

Report

A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits

Autor: Alami, Reda, Mahfoud, Mohammed, Achab, Mastane

In a typical stochastic multi-armed bandit problem, the objective is often to maximize the expected sum of rewards over some time horizon $T$. While the choice of a strategy that accomplishes that is optimal with no additional information, it is no l

Externí odkaz: http://arxiv.org/abs/2310.19821

Zobrazit plný text záznamu

Report

Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study

Autor: Boukhalfa, Fouzi, Alami, Reda, Achab, Mastane, Moulines, Eric, Bennis, Mehdi

In today's era, autonomous vehicles demand a safety level on par with aircraft. Taking a cue from the aerospace industry, which relies on redundancy to achieve high reliability, the automotive sector can also leverage this concept by building redunda

Externí odkaz: http://arxiv.org/abs/2310.03767

Zobrazit plný text záznamu

Report

Emergent Communication in Multi-Agent Reinforcement Learning for Future Wireless Networks

Autor: Chafii, Marwa, Naoumi, Salmane, Alami, Reda, Almazrouei, Ebtesam, Bennis, Mehdi, Debbah, Merouane

In different wireless network scenarios, multiple network entities need to cooperate in order to achieve a common task with minimum delay and energy consumption. Future wireless networks mandate exchanging high dimensional data in dynamic and uncerta

Externí odkaz: http://arxiv.org/abs/2309.06021

Zobrazit plný text záznamu

Report

One-Step Distributional Reinforcement Learning

Autor: Achab, Mastane, Alami, Reda, Djilali, Yasser Abdelaziz Dahou, Fedyanin, Kirill, Moulines, Eric

Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term expected return. In the distributional RL (DistrRL) paradigm, the agent goes beyond the limit of the expected value, to capture the und

Externí odkaz: http://arxiv.org/abs/2304.14421

Zobrazit plný text záznamu

Report

Regularization of the policy updates for stabilizing Mean Field Games

Autor: Algumaei, Talal, Solozabal, Ruben, Alami, Reda, Hacid, Hakim, Debbah, Merouane, Takac, Martin

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the

Externí odkaz: http://arxiv.org/abs/2304.01547

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání