Zobrazeno 1 - 10
of 89
pro vyhledávání: '"Li, Yanghao"'
Autor:
Yin, Shanzhi, Xu, Tongda, Liang, Yongsheng, Wang, Yuanyuan, Li, Yanghao, Wang, Yan, Liu, Jingjing
With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper,
Externí odkaz:
http://arxiv.org/abs/2309.02855
Autor:
Xu, Tongda, Zhang, Qian, Li, Yanghao, He, Dailan, Wang, Zhe, Wang, Yuanyuan, Qin, Hongwei, Wang, Yan, Liu, Jingjing, Zhang, Ya-Qin
We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information. Specifically, we extend the original perceptual quality $d(p_{X},p_{\hat{X}})$ to
Externí odkaz:
http://arxiv.org/abs/2308.08154
Autor:
Nguyen, Duy-Kien, Aggarwal, Vaibhav, Li, Yanghao, Oswald, Martin R., Kirillov, Alexander, Snoek, Cees G. M., Chen, Xinlei
In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from
Externí odkaz:
http://arxiv.org/abs/2306.05411
Autor:
Ryali, Chaitanya, Hu, Yuan-Ting, Bolya, Daniel, Wei, Chen, Fan, Haoqi, Huang, Po-Yao, Aggarwal, Vaibhav, Chowdhury, Arkabandhu, Poursaeed, Omid, Hoffman, Judy, Malik, Jitendra, Li, Yanghao, Feichtenhofer, Christoph
Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actual
Externí odkaz:
http://arxiv.org/abs/2306.00989
In this paper, we first propose the concept of strong idempotent codec based on idempotent codec. The idempotence of codec refers to the stability of codec to re-compression. Similarly, we define the strong idempotence of codec as the stability of co
Externí odkaz:
http://arxiv.org/abs/2304.08269
Autor:
Wei, Chen, Mangalam, Karttikeya, Huang, Po-Yao, Li, Yanghao, Fan, Haoqi, Xu, Hu, Wang, Huiyu, Xie, Cihang, Yuille, Alan, Feichtenhofer, Christoph
There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While d
Externí odkaz:
http://arxiv.org/abs/2304.03283
Video semantic segmentation (VSS) is a computationally expensive task due to the per-frame prediction for videos of high frame rates. In recent work, compact models or adaptive network strategies have been proposed for efficient VSS. However, they di
Externí odkaz:
http://arxiv.org/abs/2303.07224
Autor:
Mangalam, Karttikeya, Fan, Haoqi, Li, Yanghao, Wu, Chao-Yuan, Xiong, Bo, Feichtenhofer, Christoph, Malik, Jitendra
We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with effici
Externí odkaz:
http://arxiv.org/abs/2302.04869
Autor:
Huang, Po-Yao, Sharma, Vasu, Xu, Hu, Ryali, Chaitanya, Fan, Haoqi, Li, Yanghao, Li, Shang-Wen, Ghosh, Gargi, Malik, Jitendra, Feichtenhofer, Christoph
We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach learns with three complementary forms of self-supervision: (1) reconstruction of masked audio and video input data, (2) intra- and inter-modal contrast
Externí odkaz:
http://arxiv.org/abs/2212.08071
We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP. Our method randomly masks out and removes a large portion of image patches during training. Masking allows us to learn from more image-text pair
Externí odkaz:
http://arxiv.org/abs/2212.00794