Zobrazeno 1 - 10
of 169
pro vyhledávání: '"P. Owens, Andrew"'
Autor:
Geng, Daniel, Herrmann, Charles, Hur, Junhwa, Cole, Forrester, Zhang, Serena, Pfaff, Tobias, Lopez-Guevara, Tatiana, Doersch, Carl, Aytar, Yusuf, Rubinstein, Michael, Sun, Chen, Wang, Oliver, Owens, Andrew, Sun, Deqing
Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositio
Externí odkaz:
http://arxiv.org/abs/2412.02700
Autor:
Chen, Ziyang, Seetharaman, Prem, Russell, Bryan, Nieto, Oriol, Bourgin, David, Owens, Andrew, Salamon, Justin
Generating sound effects for videos often requires creating artistic sound effects that diverge significantly from real-life sources and flexible control in the sound design. To address this problem, we introduce MultiFoley, a model designed for vide
Externí odkaz:
http://arxiv.org/abs/2411.17698
Autor:
Park, Jeongsoo, Owens, Andrew
One of the key challenges of detecting AI-generated images is spotting images that have been created by previously unseen generative models. We argue that the limited diversity of the training data is a major obstacle to addressing this problem, and
Externí odkaz:
http://arxiv.org/abs/2411.04125
Autor:
Rodriguez, Samanta, Dou, Yiming, Bogert, William van den, Oller, Miquel, So, Kevin, Owens, Andrew, Fazeli, Nima
Today's tactile sensors have a variety of different designs, making it challenging to develop general-purpose methods for processing touch signals. In this paper, we learn a unified representation that captures the shared information between differen
Externí odkaz:
http://arxiv.org/abs/2410.11834
Autor:
Shrivastava, Ayush, Owens, Andrew
We present a simple, self-supervised approach to the Tracking Any Point (TAP) problem. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks, using the transformer's attention-based global m
Externí odkaz:
http://arxiv.org/abs/2409.16288
Speech sounds convey a great deal of information about the scenes, resulting in a variety of effects ranging from reverberation to additional ambient sounds. In this paper, we manipulate input speech to sound as though it was recorded within a differ
Externí odkaz:
http://arxiv.org/abs/2409.14340
Modern incarnations of tactile sensors produce high-dimensional raw sensory feedback such as images, making it challenging to efficiently store, process, and generalize across sensors. To address these concerns, we introduce a novel implicit function
Externí odkaz:
http://arxiv.org/abs/2409.14592
Today's touch sensors come in many shapes and sizes. This has made it challenging to develop general-purpose touch processing methods since models are generally tied to one specific sensor design. We address this problem by performing cross-modal pre
Externí odkaz:
http://arxiv.org/abs/2409.08269
Spectrograms are 2D representations of sound that look very different from the images found in our visual world. And natural images, when played as spectrograms, make unnatural sounds. In this paper, we show that it is possible to synthesize spectrog
Externí odkaz:
http://arxiv.org/abs/2405.12221
We propose a simple strategy for masking image patches during visual-language contrastive learning that improves the quality of the learned representations and the training speed. During each iteration of training, we randomly mask clusters of visual
Externí odkaz:
http://arxiv.org/abs/2405.08815