Zobrazeno 1 - 10
of 105
pro vyhledávání: '"Sreenivasan, Kartik"'
Autor:
Ankner, Zachary, Blakeney, Cody, Sreenivasan, Kartik, Marion, Max, Leavitt, Matthew L., Paul, Mansheej
In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the perplexity of a
Externí odkaz:
http://arxiv.org/abs/2405.20541
Autor:
Cho, Jaewoong, Sreenivasan, Kartik, Lee, Keon, Mun, Kyunghoo, Yi, Soheun, Lee, Jeong-Gwan, Lee, Anna, Sohn, Jy-yong, Papailiopoulos, Dimitris, Lee, Kangwook
Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same
Externí odkaz:
http://arxiv.org/abs/2307.05906
Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction
Externí odkaz:
http://arxiv.org/abs/2307.03381
Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light o
Externí odkaz:
http://arxiv.org/abs/2305.18869
Autor:
Sreenivasan, Kartik, Sohn, Jy-yong, Yang, Liu, Grinde, Matthew, Nagle, Alliot, Wang, Hongyi, Xing, Eric, Lee, Kangwook, Papailiopoulos, Dimitris
Large neural networks can be pruned to a small fraction of their original size, with little loss in accuracy, by following a time-consuming "train, prune, re-train" approach. Frankle & Carbin conjecture that we can avoid this by training "lottery tic
Externí odkaz:
http://arxiv.org/abs/2202.12002
A recent work by Ramanujan et al. (2020) provides significant empirical evidence that sufficiently overparameterized, random neural networks contain untrained subnetworks that achieve state-of-the-art accuracy on several predictive tasks. A follow-up
Externí odkaz:
http://arxiv.org/abs/2110.08996
It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} netwo
Externí odkaz:
http://arxiv.org/abs/2106.07724
Autor:
Wang, Hongyi, Sreenivasan, Kartik, Rajput, Shashank, Vishwakarma, Harit, Agarwal, Saurabh, Sohn, Jy-yong, Lee, Kangwook, Papailiopoulos, Dimitris
Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifyi
Externí odkaz:
http://arxiv.org/abs/2007.05084
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Donaldson, Kayla R., Roach, Brian J., Ford, Judith M., Lai, Karen, Sreenivasan, Kartik K., Mathalon, Daniel H.
Publikováno v:
In Biological Psychology January 2019 140:9-18