Zobrazeno 1 - 10
of 642
pro vyhledávání: '"Ramchandran, Kannan"'
With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn task-specific re
Externí odkaz:
http://arxiv.org/abs/2410.02223
Recent work has shown that Transformers trained from scratch can successfully solve various arithmetic and algorithmic tasks, such as adding numbers and computing parity. While these Transformers generalize well on unseen inputs of the same length, t
Externí odkaz:
http://arxiv.org/abs/2409.15647
Autor:
Rajaraman, Nived, Bondaschi, Marco, Ramchandran, Kannan, Gastpar, Michael, Makkuva, Ashok Vardhan
Attention-based transformers have been remarkably successful at modeling generative processes across various domains and modalities. In this paper, we study the behavior of transformers on data drawn from \kth Markov processes, where the conditional
Externí odkaz:
http://arxiv.org/abs/2407.17686
While there has been a large body of research attempting to circumvent tokenization for language modeling (Clark et al., 2022; Xue et al., 2022), the current consensus is that it is a necessary initial step for designing state-of-the-art performant l
Externí odkaz:
http://arxiv.org/abs/2404.08335
One of the key challenges in machine learning is to find interpretable representations of learned functions. The M\"obius transform is essential for this purpose, as its coefficients correspond to unique importance scores for sets of input variables.
Externí odkaz:
http://arxiv.org/abs/2402.02631
Autor:
Huang, Baihe, Zhu, Hanlin, Zhu, Banghua, Ramchandran, Kannan, Jordan, Michael I., Lee, Jason D., Jiao, Jiantao
We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region,
Externí odkaz:
http://arxiv.org/abs/2312.07930
Large Language Models (LLMs) can acquire extensive world knowledge through pre-training on large corpora. However, due to exposure to low-quality data, LLMs may exhibit harmful behavior without aligning with human values. The dominant approach for st
Externí odkaz:
http://arxiv.org/abs/2310.00212
Magnetic resonance imaging (MRI) exam protocols consist of multiple contrast-weighted images of the same anatomy to emphasize different tissue properties. Due to the long acquisition times required to collect fully sampled k-space measurements, it is
Externí odkaz:
http://arxiv.org/abs/2303.14795
Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it g
Externí odkaz:
http://arxiv.org/abs/2303.11453
We consider the sequential decision-making problem where the mean outcome is a non-linear function of the chosen action. Compared with the linear model, two curious phenomena arise in non-linear models: first, in addition to the "learning phase" with
Externí odkaz:
http://arxiv.org/abs/2302.06025