Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Oord, Aaron van den"'
In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, sta
Externí odkaz:
http://arxiv.org/abs/2112.06749
Autor:
Wang, Luyu, Luc, Pauline, Wu, Yan, Recasens, Adria, Smaira, Lucas, Brock, Andrew, Jaegle, Andrew, Alayrac, Jean-Baptiste, Dieleman, Sander, Carreira, Joao, Oord, Aaron van den
The ability to learn universal audio representations that can solve diverse speech, music, and environment tasks can spur many applications that require general sound content understanding. In this work, we introduce a holistic audio representation e
Externí odkaz:
http://arxiv.org/abs/2111.12124
Self-supervised learning holds promise in leveraging large amounts of unlabeled data, however much of its progress has thus far been limited to highly curated pre-training data such as ImageNet. We explore the effects of contrastive learning from lar
Externí odkaz:
http://arxiv.org/abs/2105.08054
We present a multimodal framework to learn general audio representations from videos. Existing contrastive audio representation learning methods mainly focus on using the audio modality alone during training. In this work, we show that additional inf
Externí odkaz:
http://arxiv.org/abs/2104.12807
Autor:
Hénaff, Olivier J., Koppula, Skanda, Alayrac, Jean-Baptiste, Oord, Aaron van den, Vinyals, Oriol, Carreira, João
Self-supervised pretraining has been shown to yield powerful representations for transfer learning. These performance gains come at a large computational cost however, with state-of-the-art methods requiring an order of magnitude more computation tha
Externí odkaz:
http://arxiv.org/abs/2103.10957
Autor:
Wang, Luyu, Oord, Aaron van den
Recent advances suggest the advantage of multi-modal training in comparison with single-modal methods. In contrast to this view, in our work we find that similar gain can be obtained from training with different formats of a single modality. In parti
Externí odkaz:
http://arxiv.org/abs/2103.06508
Unsupervised speech representation learning has shown remarkable success at finding representations that correlate with phonetic structures and improve downstream speech recognition performance. However, most research has been focused on evaluating t
Externí odkaz:
http://arxiv.org/abs/2001.11128
Autor:
Gregor, Karol, Rezende, Danilo Jimenez, Besse, Frederic, Wu, Yan, Merzic, Hamza, Oord, Aaron van den
When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive
Externí odkaz:
http://arxiv.org/abs/1906.09237
We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fid
Externí odkaz:
http://arxiv.org/abs/1906.00446
Autor:
Hénaff, Olivier J., Srinivas, Aravind, De Fauw, Jeffrey, Razavi, Ali, Doersch, Carl, Eslami, S. M. Ali, Oord, Aaron van den
Human observers can learn to recognize new categories of images from a handful of examples, yet doing so with artificial ones remains an open challenge. We hypothesize that data-efficient recognition is enabled by representations which make the varia
Externí odkaz:
http://arxiv.org/abs/1905.09272