Výsledky vyhledávání

Report

Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training

Autor: Kosson, Atli, Messmer, Bettina, Jaggi, Martin

Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits. Warmup decreases the update size $\Delta \mathbf{w}_t = \eta_t \mathbf{u}_t$ early in training

Externí odkaz: http://arxiv.org/abs/2410.23922

Zobrazit plný text záznamu

Report

Improving Stochastic Cubic Newton with Momentum

Autor: Chayti, El Mahdi, Doikov, Nikita, Jaggi, Martin

We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provabl

Externí odkaz: http://arxiv.org/abs/2410.19644

Zobrazit plný text záznamu

Report

Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree

Autor: Jaggi, Harbani, Murali, Kashyap, Fleisig, Eve, Bıyık, Erdem

When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text by incorp

Externí odkaz: http://arxiv.org/abs/2410.12217

Zobrazit plný text záznamu

Report

HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

Autor: Zhou, Xinyu, Fan, Simin, Jaggi, Martin

Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed

Externí odkaz: http://arxiv.org/abs/2410.05090

Zobrazit plný text záznamu

Report

Digital Twin Ecosystem for Oncology Clinical Operations

Autor: Pandey, Himanshu, Amod, Akhil, Shivang, Jaggi, Kshitij, Garg, Ruchi, Jain, Abheet, Tantia, Vinayak

Artificial Intelligence (AI) and Large Language Models (LLMs) hold significant promise in revolutionizing healthcare, especially in clinical applications. Simultaneously, Digital Twin technology, which models and simulates complex systems, has gained

Externí odkaz: http://arxiv.org/abs/2409.17650

Zobrazit plný text záznamu

Report

On-Device Collaborative Language Modeling via a Mixture of Generalists and Specialists

Autor: Fan, Dongyang, Messmer, Bettina, Jaggi, Martin

On-device LLMs have gained increasing attention for their ability to enhance privacy and provide a personalized user experience. To facilitate learning with private and scarce local data, federated learning has become a standard approach, though it i

Externí odkaz: http://arxiv.org/abs/2409.13931

Zobrazit plný text záznamu

Report

CoBo: Collaborative Learning via Bilevel Optimization

Autor: Hashemi, Diba, He, Lie, Jaggi, Martin

Collaborative learning is an important tool to train multiple clients more effectively by enabling communication among clients. Identifying helpful clients, however, presents challenging and often introduces significant overhead. In this paper, we mo

Externí odkaz: http://arxiv.org/abs/2409.05539

Zobrazit plný text záznamu

Report

A New First-Order Meta-Learning Algorithm with Convergence Guarantees

Autor: Chayti, El Mahdi, Jaggi, Martin

Learning new tasks by drawing on prior experience gathered from other (related) tasks is a core property of any intelligent system. Gradient-based meta-learning, especially MAML and its variants, has emerged as a viable solution to accomplish this go

Externí odkaz: http://arxiv.org/abs/2409.03682

Zobrazit plný text záznamu

Report

Could ChatGPT get an Engineering Degree? Evaluating Higher Education Vulnerability to AI Assistants

Autor: Borges, Beatriz, Foroutan, Negar, Bayazit, Deniz, Sotnikova, Anna, Montariol, Syrielle, Nazaretzky, Tanya, Banaei, Mohammadreza, Sakhaeirad, Alireza, Servant, Philippe, Neshaei, Seyed Parsa, Frej, Jibril, Romanou, Angelika, Weiss, Gail, Mamooler, Sepideh, Chen, Zeming, Fan, Simin, Gao, Silin, Ismayilzada, Mete, Paul, Debjit, Schöpfer, Alexandre, Janchevski, Andrej, Tiede, Anja, Linden, Clarence, Troiani, Emanuele, Salvi, Francesco, Behrens, Freya, Orsi, Giacomo, Piccioli, Giovanni, Sevel, Hadrien, Coulon, Louis, Pineros-Rodriguez, Manuela, Bonnassies, Marin, Hellich, Pierre, van Gerwen, Puck, Gambhir, Sankalp, Pirelli, Solal, Blanchard, Thomas, Callens, Timothée, Aoun, Toni Abi, Alonso, Yannick Calvino, Cho, Yuri, Chiappa, Alberto, Sclocchi, Antonio, Bruno, Étienne, Hofhammer, Florian, Pescia, Gabriel, Rizk, Geovani, Dadi, Leello, Stoffl, Lucas, Ribeiro, Manoel Horta, Bovel, Matthieu, Pan, Yueyang, Radenovic, Aleksandra, Alahi, Alexandre, Mathis, Alexander, Bitbol, Anne-Florence, Faltings, Boi, Hébert, Cécile, Tuia, Devis, Maréchal, François, Candea, George, Carleo, Giuseppe, Chappelier, Jean-Cédric, Flammarion, Nicolas, Fürbringer, Jean-Marie, Pellet, Jean-Philippe, Aberer, Karl, Zdeborová, Lenka, Salathé, Marcel, Jaggi, Martin, Rajman, Martin, Payer, Mathias, Wyart, Matthieu, Gastpar, Michael, Ceriotti, Michele, Svensson, Ola, Lévêque, Olivier, Ienne, Paolo, Guerraoui, Rachid, West, Robert, Kashyap, Sanidhya, Piazza, Valerio, Simanis, Viesturs, Kuncak, Viktor, Cevher, Volkan, Schwaller, Philippe, Friedli, Sacha, Jermann, Patrick, Kaser, Tanja, Bosselut, Antoine

AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes.

Externí odkaz: http://arxiv.org/abs/2408.11841

Zobrazit plný text záznamu

Report

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Autor: Harma, Simla Burcu, Chakraborty, Ayan, Kostenok, Elizaveta, Mishin, Danila, Ha, Dongho, Falsafi, Babak, Jaggi, Martin, Liu, Ming, Oh, Yunho, Subramanian, Suvinay, Yazdanbakhsh, Amir

The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonst

Externí odkaz: http://arxiv.org/abs/2405.20935

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání