Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Patrick, Mandela"'
Autor:
Patrick, Mandela, Campbell, Dylan, Asano, Yuki M., Misra, Ishan, Metze, Florian, Feichtenhofer, Christoph, Vedaldi, Andrea, Henriques, João F.
In video transformers, the time dimension is often treated in the same way as the two spatial dimensions. However, in a scene where objects or the camera may move, a physical point imaged at one location in frame $t$ may be entirely unrelated to what
Externí odkaz:
http://arxiv.org/abs/2106.05392
Autor:
Patrick, Mandela, Asano, Yuki M., Huang, Bernie, Misra, Ishan, Metze, Florian, Henriques, Joao, Vedaldi, Andrea
The quality of the image representations obtained from self-supervised learning depends strongly on the type of data augmentations used in the learning formulation. Recent papers have ported these methods from still images to videos and found that le
Externí odkaz:
http://arxiv.org/abs/2103.10211
Autor:
Huang, Po-Yao, Patrick, Mandela, Hu, Junjie, Neubig, Graham, Metze, Florian, Hauptmann, Alexander
This paper studies zero-shot cross-lingual transfer of vision-language models. Specifically, we focus on multilingual text-to-video search and propose a Transformer-based model that learns contextualized multilingual multimodal embeddings. Under a ze
Externí odkaz:
http://arxiv.org/abs/2103.08849
Autor:
Patrick, Mandela, Huang, Po-Yao, Asano, Yuki, Metze, Florian, Hauptmann, Alexander, Henriques, João, Vedaldi, Andrea
The dominant paradigm for learning video-text representations -- noise contrastive learning -- increases the similarity of the representations of pairs of samples that are known to be related, such as text and video from the same sample, and pushes a
Externí odkaz:
http://arxiv.org/abs/2010.02824
A large part of the current success of deep learning lies in the effectiveness of data -- more precisely: labelled data. Yet, labelling a dataset with human annotation continues to carry high costs, especially for videos. While in the image domain, r
Externí odkaz:
http://arxiv.org/abs/2006.13662
Autor:
Patrick, Mandela, Asano, Yuki M., Kuznetsova, Polina, Fong, Ruth, Henriques, João F., Zweig, Geoffrey, Vedaldi, Andrea
In the image domain, excellent representations can be learned by inducing invariance to content-preserving transformations via noise contrastive learning. In this paper, we generalize contrastive learning to a wider set of transformations, and their
Externí odkaz:
http://arxiv.org/abs/2003.04298
The problem of attribution is concerned with identifying the parts of an input that are responsible for a model's output. An important family of attribution methods is based on measuring the effect of perturbations applied to the input. In this paper
Externí odkaz:
http://arxiv.org/abs/1910.08485