Výsledky vyhledávání - "Liu, Bingbin"

Report

Progressive distillation induces an implicit curriculum

Autor: Panigrahi, Abhishek, Liu, Bingbin, Malladi, Sadhika, Risteski, Andrej, Goel, Surbhi

Knowledge distillation leverages a teacher model to improve the training of a student model. A persistent challenge is that a better teacher does not always yield a better student, to which a common mitigation is to use additional supervision from se

Externí odkaz: http://arxiv.org/abs/2410.05464

Zobrazit plný text záznamu

Report

TinyGSM: achieving >80% on GSM8k with small language models

Autor: Liu, Bingbin, Bubeck, Sebastien, Eldan, Ronen, Kulkarni, Janardhan, Li, Yuanzhi, Nguyen, Anh, Ward, Rachel, Zhang, Yi

Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break t

Externí odkaz: http://arxiv.org/abs/2312.09241

Zobrazit plný text záznamu

Report

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

Autor: Wen, Kaiyue, Li, Yuchen, Liu, Bingbin, Risteski, Andrej

Interpretability methods aim to understand the algorithm implemented by a trained model (e.g., a Transofmer) by examining various aspects of the model, such as the weight matrices or the attention patterns. In this work, through a combination of theo

Externí odkaz: http://arxiv.org/abs/2312.01429

Zobrazit plný text záznamu

Report

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Autor: Zhai, Runtian, Liu, Bingbin, Risteski, Andrej, Kolter, Zico, Ravikumar, Pradeep

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limite

Externí odkaz: http://arxiv.org/abs/2306.00788

Zobrazit plný text záznamu

Report

Exposing Attention Glitches with Flip-Flop Language Modeling

Autor: Liu, Bingbin, Ash, Jordan T., Goel, Surbhi, Krishnamurthy, Akshay, Zhang, Cyril

Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their adva

Externí odkaz: http://arxiv.org/abs/2306.00946

Zobrazit plný text záznamu

Report

Transformers Learn Shortcuts to Automata

Autor: Liu, Bingbin, Ash, Jordan T., Goel, Surbhi, Krishnamurthy, Akshay, Zhang, Cyril

Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewe

Externí odkaz: http://arxiv.org/abs/2210.10749

Zobrazit plný text záznamu

Report

Masked prediction tasks: a parameter identifiability view

Autor: Liu, Bingbin, Hsu, Daniel, Ravikumar, Pradeep, Risteski, Andrej

The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the

Externí odkaz: http://arxiv.org/abs/2202.09305

Zobrazit plný text záznamu

Report

Analyzing and Improving the Optimization Landscape of Noise-Contrastive Estimation

Autor: Liu, Bingbin, Rosenfeld, Elan, Ravikumar, Pradeep, Risteski, Andrej

Noise-contrastive estimation (NCE) is a statistically consistent method for learning unnormalized probabilistic models. It has been empirically observed that the choice of the noise distribution is crucial for NCE's performance. However, such observa

Externí odkaz: http://arxiv.org/abs/2110.11271

Zobrazit plný text záznamu

Report

Contrastive learning of strong-mixing continuous-time stochastic processes

Autor: Liu, Bingbin, Ravikumar, Pradeep, Risteski, Andrej

Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data. It has recently emerged as one of the leading learning paradigms in the absence of labels across many

Externí odkaz: http://arxiv.org/abs/2103.02740

Zobrazit plný text záznamu

Report

Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction

Autor: Liu, Bingbin, Adeli, Ehsan, Cao, Zhangjie, Lee, Kuan-Hui, Shenoi, Abhijeet, Gaidon, Adrien, Niebles, Juan Carlos

Reasoning over visual data is a desirable capability for robotics and vision-based applications. Such reasoning enables forecasting of the next events or actions in videos. In recent years, various models have been developed based on convolution oper

Externí odkaz: http://arxiv.org/abs/2002.08945

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání