Zobrazeno 1 - 10
of 38
pro vyhledávání: '"Liu, Bingbin"'
Knowledge distillation leverages a teacher model to improve the training of a student model. A persistent challenge is that a better teacher does not always yield a better student, to which a common mitigation is to use additional supervision from se
Externí odkaz:
http://arxiv.org/abs/2410.05464
Autor:
Liu, Bingbin, Bubeck, Sebastien, Eldan, Ronen, Kulkarni, Janardhan, Li, Yuanzhi, Nguyen, Anh, Ward, Rachel, Zhang, Yi
Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break t
Externí odkaz:
http://arxiv.org/abs/2312.09241
Interpretability methods aim to understand the algorithm implemented by a trained model (e.g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns. In this work, through a combination of theo
Externí odkaz:
http://arxiv.org/abs/2312.01429
Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limite
Externí odkaz:
http://arxiv.org/abs/2306.00788
Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their adva
Externí odkaz:
http://arxiv.org/abs/2306.00946
Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewe
Externí odkaz:
http://arxiv.org/abs/2210.10749
The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the
Externí odkaz:
http://arxiv.org/abs/2202.09305
Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observa
Externí odkaz:
http://arxiv.org/abs/2110.11271
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. It has recently emerged as one of the leading learning paradigms in the absence of labels across many
Externí odkaz:
http://arxiv.org/abs/2103.02740
Autor:
Liu, Bingbin, Adeli, Ehsan, Cao, Zhangjie, Lee, Kuan-Hui, Shenoi, Abhijeet, Gaidon, Adrien, Niebles, Juan Carlos
Reasoning over visual data is a desirable capability for robotics and vision-based applications. Such reasoning enables forecasting of the next events or actions in videos. In recent years, various models have been developed based on convolution oper
Externí odkaz:
http://arxiv.org/abs/2002.08945