Výsledky vyhledávání

Report

ElasticTok: Adaptive Tokenization for Image and Video

Autor: Yan, Wilson, Zaharia, Matei, Mnih, Volodymyr, Abbeel, Pieter, Faust, Aleksandra, Liu, Hao

Efficient video tokenization remains a key bottleneck in learning general purpose vision models that are capable of processing long video sequences. Prevailing approaches are restricted to encoding videos to a fixed number of tokens, where too few to

Externí odkaz: http://arxiv.org/abs/2410.08368

Zobrazit plný text záznamu

Report

World Model on Million-Length Video And Language With Blockwise RingAttention

Autor: Liu, Hao, Yan, Wilson, Zaharia, Matei, Abbeel, Pieter

Current language models fall short in understanding aspects of the world not easily described in words, and struggle with complex, long-form tasks. Video sequences offer valuable temporal information absent in language and static images, making them

Externí odkaz: http://arxiv.org/abs/2402.08268

Zobrazit plný text záznamu

Report

ALP: Action-Aware Embodied Learning for Perception

Autor: Liang, Xinran, Han, Anthony, Yan, Wilson, Raghunathan, Aditi, Abbeel, Pieter

Current methods in training and benchmarking vision models exhibit an over-reliance on passive, curated datasets. Although models trained on these datasets have shown strong performance in a wide variety of tasks such as classification, detection, an

Externí odkaz: http://arxiv.org/abs/2306.10190

Zobrazit plný text záznamu

Report

Video Prediction Models as Rewards for Reinforcement Learning

Autor: Escontrela, Alejandro, Adeniji, Ademi, Yan, Wilson, Jain, Ajay, Peng, Xue Bin, Goldberg, Ken, Lee, Youngwoon, Hafner, Danijar, Abbeel, Pieter

Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the inter

Externí odkaz: http://arxiv.org/abs/2305.14343

Zobrazit plný text záznamu

Report

Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment

Autor: Liu, Hao, Yan, Wilson, Abbeel, Pieter

Recent progress in scaling up large language models has shown impressive capabilities in performing few-shot learning across a wide range of text-based tasks. However, a key limitation is that these language models fundamentally lack visual perceptio

Externí odkaz: http://arxiv.org/abs/2302.00902

Zobrazit plný text záznamu

Report

Temporally Consistent Transformers for Video Generation

Autor: Yan, Wilson, Hafner, Danijar, James, Stephen, Abbeel, Pieter

To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world. Current algorithms enable accurate predictions over short horizons but tend to suffer from temporal inconsistencies. When generated content

Externí odkaz: http://arxiv.org/abs/2210.02396

Zobrazit plný text záznamu

Report

Patch-based Object-centric Transformers for Efficient Video Generation

Autor: Yan, Wilson, Okumura, Ryo, James, Stephen, Abbeel, Pieter

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in v

Externí odkaz: http://arxiv.org/abs/2206.04003

Zobrazit plný text záznamu

Report

VideoGPT: Video Generation using VQ-VAE and Transformers

Autor: Yan, Wilson, Zhang, Yunzhi, Abbeel, Pieter, Srinivas, Aravind

We present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and a

Externí odkaz: http://arxiv.org/abs/2104.10157

Zobrazit plný text záznamu

Report

Learning Predictive Representations for Deformable Objects Using Contrastive Estimation

Autor: Yan, Wilson, Vangipuram, Ashwin, Abbeel, Pieter, Pinto, Lerrel

Using visual model-based learning for deformable object manipulation is challenging due to difficulties in learning plannable visual representations along with complex dynamic models. In this work, we propose a new learning framework that jointly opt

Externí odkaz: http://arxiv.org/abs/2003.05436

Zobrazit plný text záznamu

Report

Natural Image Manipulation for Autoregressive Models Using Fisher Scores

Autor: Yan, Wilson, Ho, Jonathan, Abbeel, Pieter

Deep autoregressive models are one of the most powerful models that exist today which achieve state-of-the-art bits per dim. However, they lie at a strict disadvantage when it comes to controlled sample generation compared to latent variable models.

Externí odkaz: http://arxiv.org/abs/1912.05015

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání