Zobrazeno 1 - 10
of 194
pro vyhledávání: '"Kim, Seungyeon"'
Autor:
Godbole, Ameya, Monath, Nicholas, Kim, Seungyeon, Rawat, Ankit Singh, McCallum, Andrew, Zaheer, Manzil
In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametr
Externí odkaz:
http://arxiv.org/abs/2408.10490
Developing text-based robot trajectory generation models is made particularly difficult by the small dataset size, high dimensionality of the trajectory space, and the inherent complexity of the text-conditional motion distribution. Recent manifold l
Externí odkaz:
http://arxiv.org/abs/2407.19681
Autor:
Narasimhan, Harikrishna, Jitkrittum, Wittawat, Rawat, Ankit Singh, Kim, Seungyeon, Gupta, Neha, Menon, Aditya Krishna, Kumar, Sanjiv
Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule
Externí odkaz:
http://arxiv.org/abs/2405.19261
Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision com
Externí odkaz:
http://arxiv.org/abs/2301.12245
Autor:
Kim, Seungyeon, Rawat, Ankit Singh, Zaheer, Manzil, Jayasumana, Sadeep, Sadhanala, Veeranjaneyulu, Jitkrittum, Wittawat, Menon, Aditya Krishna, Fergus, Rob, Kumar, Sanjiv
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice.
Externí odkaz:
http://arxiv.org/abs/2301.12005
Autor:
Kim, Seungyeon
A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major ch
Externí odkaz:
http://hdl.handle.net/1853/53946
Autor:
Zaheer, Manzil, Rawat, Ankit Singh, Kim, Seungyeon, You, Chong, Jain, Himanshu, Veit, Andreas, Fergus, Rob, Kumar, Sanjiv
The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also nec
Externí odkaz:
http://arxiv.org/abs/2208.06825
Autor:
Kim, Seungyeon, Glasner, Daniel, Ramalingam, Srikumar, Hsieh, Cho-Jui, Papineni, Kishore, Kumar, Sanjiv
It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model's sensitivity to rare or underre
Externí odkaz:
http://arxiv.org/abs/2105.09394
Publikováno v:
In Frontiers in Neuroendocrinology April 2024 73
Autor:
Bhojanapalli, Srinadh, Wilber, Kimberly, Veit, Andreas, Rawat, Ankit Singh, Kim, Seungyeon, Menon, Aditya, Kumar, Sanjiv
Standard training techniques for neural networks involve multiple sources of randomness, e.g., initialization, mini-batch ordering and in some cases data augmentation. Given that neural networks are heavily over-parameterized in practice, such random
Externí odkaz:
http://arxiv.org/abs/2102.03349