Zobrazeno 1 - 10
of 109
pro vyhledávání: '"Hu, Edward"'
Autor:
Hu, Edward S., Ahn, Kwangjun, Liu, Qinghua, Xu, Haoran, Tomar, Manan, Langford, Ada, Jayaraman, Dinesh, Lamb, Alex, Langford, John
We introduce the "Belief State Transformer", a next-token predictor that takes both a prefix and suffix as inputs, with a novel objective of predicting both the next token for the prefix and the previous token for the suffix. The Belief State Transfo
Externí odkaz:
http://arxiv.org/abs/2410.23506
Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive hu
Externí odkaz:
http://arxiv.org/abs/2408.09807
We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learn
Externí odkaz:
http://arxiv.org/abs/2405.14853
Autor:
Hu, Edward J., Jain, Moksh, Elmoznino, Eric, Kaddar, Younesse, Lajoie, Guillaume, Bengio, Yoshua, Malkin, Nikolay
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of int
Externí odkaz:
http://arxiv.org/abs/2310.04363
Autor:
Soulos, Paul, Hu, Edward, McCurdy, Kate, Chen, Yunmo, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng
In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a
Externí odkaz:
http://arxiv.org/abs/2306.00751
Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying ho
Externí odkaz:
http://arxiv.org/abs/2303.13002
Autor:
Hu, Edward J., Malkin, Nikolay, Jain, Moksh, Everett, Katie, Graikos, Alexandros, Bengio, Yoshua
Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is betw
Externí odkaz:
http://arxiv.org/abs/2302.06576
Physical interactions can often help reveal information that is not readily apparent. For example, we may tug at a table leg to evaluate whether it is built well, or turn a water bottle upside down to check that it is watertight. We propose to train
Externí odkaz:
http://arxiv.org/abs/2212.08961
Autor:
Malkin, Nikolay, Lahlou, Salem, Deleu, Tristan, Ji, Xu, Hu, Edward, Everett, Katie, Zhang, Dinghuai, Bengio, Yoshua
This paper builds bridges between two families of probabilistic algorithms: (hierarchical) variational inference (VI), which is typically used to model distributions over continuous spaces, and generative flow networks (GFlowNets), which have been us
Externí odkaz:
http://arxiv.org/abs/2210.00580
Autor:
Yang, Greg, Hu, Edward J., Babuschkin, Igor, Sidor, Szymon, Liu, Xiaodong, Farhi, David, Ryder, Nick, Pachocki, Jakub, Chen, Weizhu, Gao, Jianfeng
Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable
Externí odkaz:
http://arxiv.org/abs/2203.03466