Zobrazeno 1 - 10
of 612
pro vyhledávání: '"Cevher, Volkan"'
Autor:
Sarkar, Soumajyoti, Lausen, Leonard, Cevher, Volkan, Zha, Sheng, Brox, Thomas, Karypis, George
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling. These models use conditionally activated feedforward subnetworks in transformer blocks, allowing for a separation between total model
Externí odkaz:
http://arxiv.org/abs/2409.01483
Autor:
Borges, Beatriz, Foroutan, Negar, Bayazit, Deniz, Sotnikova, Anna, Montariol, Syrielle, Nazaretzky, Tanya, Banaei, Mohammadreza, Sakhaeirad, Alireza, Servant, Philippe, Neshaei, Seyed Parsa, Frej, Jibril, Romanou, Angelika, Weiss, Gail, Mamooler, Sepideh, Chen, Zeming, Fan, Simin, Gao, Silin, Ismayilzada, Mete, Paul, Debjit, Schöpfer, Alexandre, Janchevski, Andrej, Tiede, Anja, Linden, Clarence, Troiani, Emanuele, Salvi, Francesco, Behrens, Freya, Orsi, Giacomo, Piccioli, Giovanni, Sevel, Hadrien, Coulon, Louis, Pineros-Rodriguez, Manuela, Bonnassies, Marin, Hellich, Pierre, van Gerwen, Puck, Gambhir, Sankalp, Pirelli, Solal, Blanchard, Thomas, Callens, Timothée, Aoun, Toni Abi, Alonso, Yannick Calvino, Cho, Yuri, Chiappa, Alberto, Sclocchi, Antonio, Bruno, Étienne, Hofhammer, Florian, Pescia, Gabriel, Rizk, Geovani, Dadi, Leello, Stoffl, Lucas, Ribeiro, Manoel Horta, Bovel, Matthieu, Pan, Yueyang, Radenovic, Aleksandra, Alahi, Alexandre, Mathis, Alexander, Bitbol, Anne-Florence, Faltings, Boi, Hébert, Cécile, Tuia, Devis, Maréchal, François, Candea, George, Carleo, Giuseppe, Chappelier, Jean-Cédric, Flammarion, Nicolas, Fürbringer, Jean-Marie, Pellet, Jean-Philippe, Aberer, Karl, Zdeborová, Lenka, Salathé, Marcel, Jaggi, Martin, Rajman, Martin, Payer, Mathias, Wyart, Matthieu, Gastpar, Michael, Ceriotti, Michele, Svensson, Ola, Lévêque, Olivier, Ienne, Paolo, Guerraoui, Rachid, West, Robert, Kashyap, Sanidhya, Piazza, Valerio, Simanis, Viesturs, Kuncak, Viktor, Cevher, Volkan, Schwaller, Philippe, Friedli, Sacha, Jermann, Patrick, Kaser, Tanja, Bosselut, Antoine
AI assistants are being increasingly used by students enrolled in higher education institutions. While these tools provide opportunities for improved teaching and education, they also pose significant challenges for assessment and learning outcomes.
Externí odkaz:
http://arxiv.org/abs/2408.11841
This paper rethinks Sharpness-Aware Minimization (SAM), which is originally formulated as a zero-sum game where the weights of a network and a bounded perturbation try to minimize/maximize, respectively, the same differentiable loss. To fundamentally
Externí odkaz:
http://arxiv.org/abs/2407.12993
Autor:
Park, Youngsuk, Budhathoki, Kailash, Chen, Liangfu, Kübler, Jonas, Huang, Jiaji, Kleindessner, Matthäus, Huan, Jun, Cevher, Volkan, Wang, Yida, Karypis, George
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new applications
Externí odkaz:
http://arxiv.org/abs/2407.09111
Cutting plane methods are a fundamental approach for solving integer linear programs (ILPs). In each iteration of such methods, additional linear constraints (cuts) are introduced to the constraint set with the aim of excluding the previous fractiona
Externí odkaz:
http://arxiv.org/abs/2406.18781
Given a sequence of functions $f_1,\ldots,f_n$ with $f_i:\mathcal{D}\mapsto \mathbb{R}$, finite-sum minimization seeks a point ${x}^\star \in \mathcal{D}$ minimizing $\sum_{j=1}^n f_j(x)/n$. In this work, we propose a key twist into the finite-sum mi
Externí odkaz:
http://arxiv.org/abs/2406.04731
This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance dec
Externí odkaz:
http://arxiv.org/abs/2406.03171
EEG-based seizure detection models face challenges in terms of inference speed and memory efficiency, limiting their real-time implementation in clinical devices. This paper introduces a novel graph-based residual state update mechanism (REST) for re
Externí odkaz:
http://arxiv.org/abs/2406.16906
Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study D
Externí odkaz:
http://arxiv.org/abs/2405.19201
Clustering nodes in heterophilous graphs presents unique challenges due to the asymmetric relationships often overlooked by traditional methods, which moreover assume that good clustering corresponds to high intra-cluster and low inter-cluster connec
Externí odkaz:
http://arxiv.org/abs/2405.17050