Výsledky vyhledávání - "Jaggi, Martin"

Report

On-device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Autor: Fan, Dongyang, Messmer, Bettina, Jaggi, Martin

We target on-device collaborative fine-tuning of Large Language Models (LLMs) by adapting a Mixture of Experts (MoE) architecture, where experts are Low-Rank Adaptation (LoRA) modules. In conventional MoE approaches, experts develop into specialists

Externí odkaz: http://arxiv.org/abs/2409.13931

Zobrazit plný text záznamu

Report

CoBo: Collaborative Learning via Bilevel Optimization

Autor: Hashemi, Diba, He, Lie, Jaggi, Martin

Collaborative learning is an important tool to train multiple clients more effectively by enabling communication among clients. Identifying helpful clients, however, presents challenging and often introduces significant overhead. In this paper, we mo

Externí odkaz: http://arxiv.org/abs/2409.05539

Zobrazit plný text záznamu

Report

A New First-Order Meta-Learning Algorithm with Convergence Guarantees

Autor: Chayti, El Mahdi, Jaggi, Martin

Learning new tasks by drawing on prior experience gathered from other (related) tasks is a core property of any intelligent system. Gradient-based meta-learning, especially MAML and its variants, has emerged as a viable solution to accomplish this go

Externí odkaz: http://arxiv.org/abs/2409.03682

Zobrazit plný text záznamu

Report

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Autor: Borges, Beatriz, Foroutan, Negar, Bayazit, Deniz, Sotnikova, Anna, Montariol, Syrielle, Nazaretzky, Tanya, Banaei, Mohammadreza, Sakhaeirad, Alireza, Servant, Philippe, Neshaei, Seyed Parsa, Frej, Jibril, Romanou, Angelika, Weiss, Gail, Mamooler, Sepideh, Chen, Zeming, Fan, Simin, Gao, Silin, Ismayilzada, Mete, Paul, Debjit, Schöpfer, Alexandre, Janchevski, Andrej, Tiede, Anja, Linden, Clarence, Troiani, Emanuele, Salvi, Francesco, Behrens, Freya, Orsi, Giacomo, Piccioli, Giovanni, Sevel, Hadrien, Coulon, Louis, Pineros-Rodriguez, Manuela, Bonnassies, Marin, Hellich, Pierre, van Gerwen, Puck, Gambhir, Sankalp, Pirelli, Solal, Blanchard, Thomas, Callens, Timothée, Aoun, Toni Abi, Alonso, Yannick Calvino, Cho, Yuri, Chiappa, Alberto, Sclocchi, Antonio, Bruno, Étienne, Hofhammer, Florian, Pescia, Gabriel, Rizk, Geovani, Dadi, Leello, Stoffl, Lucas, Ribeiro, Manoel Horta, Bovel, Matthieu, Pan, Yueyang, Radenovic, Aleksandra, Alahi, Alexandre, Mathis, Alexander, Bitbol, Anne-Florence, Faltings, Boi, Hébert, Cécile, Tuia, Devis, Maréchal, François, Candea, George, Carleo, Giuseppe, Chappelier, Jean-Cédric, Flammarion, Nicolas, Fürbringer, Jean-Marie, Pellet, Jean-Philippe, Aberer, Karl, Zdeborová, Lenka, Salathé, Marcel, Jaggi, Martin, Rajman, Martin, Payer, Mathias, Wyart, Matthieu, Gastpar, Michael, Ceriotti, Michele, Svensson, Ola, Lévêque, Olivier, Ienne, Paolo, Guerraoui, Rachid, West, Robert, Kashyap, Sanidhya, Piazza, Valerio, Simanis, Viesturs, Kuncak, Viktor, Cevher, Volkan, Schwaller, Philippe, Friedli, Sacha, Jermann, Patrick, Kaser, Tanja, Bosselut, Antoine

AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes.

Externí odkaz: http://arxiv.org/abs/2408.11841

Zobrazit plný text záznamu

Report

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Autor: Harma, Simla Burcu, Chakraborty, Ayan, Kostenok, Elizaveta, Mishin, Danila, Ha, Dongho, Falsafi, Babak, Jaggi, Martin, Liu, Ming, Oh, Yunho, Subramanian, Suvinay, Yazdanbakhsh, Amir

The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonst

Externí odkaz: http://arxiv.org/abs/2405.20935

Zobrazit plný text záznamu

Report

Deep Grokking: Would Deep Neural Networks Generalize Better?

Autor: Fan, Simin, Pascanu, Razvan, Jaggi, Martin

Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occur

Externí odkaz: http://arxiv.org/abs/2405.19454

Zobrazit plný text záznamu

Report

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Autor: Hägele, Alexander, Bakouch, Elie, Kosson, Atli, Allal, Loubna Ben, Von Werra, Leandro, Jaggi, Martin

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In t

Externí odkaz: http://arxiv.org/abs/2405.18392

Zobrazit plný text záznamu

Report

The Privacy Power of Correlated Noise in Decentralized Learning

Autor: Allouah, Youssef, Koloskova, Anastasia, Firdoussi, Aymane El, Jaggi, Martin, Guerraoui, Rachid

Decentralized learning is appealing as it enables the scalable usage of large amounts of distributed data and resources (without resorting to any central entity), while promoting privacy since every user minimizes the direct exposure of their data. Y

Externí odkaz: http://arxiv.org/abs/2405.01031

Zobrazit plný text záznamu

Report

Personalized Collaborative Fine-Tuning for On-Device Large Language Models

Autor: Wagner, Nicolas, Fan, Dongyang, Jaggi, Martin

Publikováno v: COLM 2024

We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability. Taking inspiration from the collaborative learning community, we introduce three distinct trust-weighted gradient aggregatio

Externí odkaz: http://arxiv.org/abs/2404.09753

Zobrazit plný text záznamu

Report

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Autor: Ashkboos, Saleh, Mohtashami, Amirkeivan, Croci, Maximilian L., Li, Bo, Jaggi, Martin, Alistarh, Dan, Hoefler, Torsten, Hensman, James

We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without

Externí odkaz: http://arxiv.org/abs/2404.00456

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání