Výsledky vyhledávání

Report

Bandwidth-efficient Inference for Neural Image Compression

Autor: Yin, Shanzhi, Xu, Tongda, Liang, Yongsheng, Wang, Yuanyuan, Li, Yanghao, Wang, Yan, Liu, Jingjing

With neural networks growing deeper and feature maps growing larger, limited communication bandwidth with external memory (or DRAM) and power constraints become a bottleneck in implementing network inference on mobile and edge devices. In this paper,

Externí odkaz: http://arxiv.org/abs/2309.02855

Zobrazit plný text záznamu

Report

Conditional Perceptual Quality Preserving Image Compression

Autor: Xu, Tongda, Zhang, Qian, Li, Yanghao, He, Dailan, Wang, Zhe, Wang, Yuanyuan, Qin, Hongwei, Wang, Yan, Liu, Jingjing, Zhang, Ya-Qin

We propose conditional perceptual quality, an extension of the perceptual quality defined in \citet{blau2018perception}, by conditioning it on user defined information. Specifically, we extend the original perceptual quality $d(p_{X},p_{\hat{X}})$ to

Externí odkaz: http://arxiv.org/abs/2308.08154

Zobrazit plný text záznamu

Report

R-MAE: Regions Meet Masked Autoencoders

Autor: Nguyen, Duy-Kien, Aggarwal, Vaibhav, Li, Yanghao, Oswald, Martin R., Kirillov, Alexander, Snoek, Cees G. M., Chen, Xinlei

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from

Externí odkaz: http://arxiv.org/abs/2306.05411

Zobrazit plný text záznamu

Report

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Autor: Ryali, Chaitanya, Hu, Yuan-Ting, Bolya, Daniel, Wei, Chen, Fan, Haoqi, Huang, Po-Yao, Aggarwal, Vaibhav, Chowdhury, Arkabandhu, Poursaeed, Omid, Hoffman, Judy, Malik, Jitendra, Li, Yanghao, Feichtenhofer, Christoph

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actual

Externí odkaz: http://arxiv.org/abs/2306.00989

Zobrazit plný text záznamu

Report

Evaluating Strong Idempotence of Image Codec

Autor: Zhang, Qian, Xu, Tongda, Li, Yanghao, Wang, Yan

In this paper, we first propose the concept of strong idempotent codec based on idempotent codec. The idempotence of codec refers to the stability of codec to re-compression. Similarly, we define the strong idempotence of codec as the stability of co

Externí odkaz: http://arxiv.org/abs/2304.08269

Zobrazit plný text záznamu

Report

Diffusion Models as Masked Autoencoders

Autor: Wei, Chen, Mangalam, Karttikeya, Huang, Po-Yao, Li, Yanghao, Fan, Haoqi, Xu, Hu, Wang, Huiyu, Xie, Cihang, Yuille, Alan, Feichtenhofer, Christoph

There has been a longstanding belief that generation can facilitate a true understanding of visual data. In line with this, we revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While d

Externí odkaz: http://arxiv.org/abs/2304.03283

Zobrazit plný text záznamu

Report

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Autor: Hu, Yubin, He, Yuze, Li, Yanghao, Li, Jisheng, Han, Yuxing, Wen, Jiangtao, Liu, Yong-Jin

Video semantic segmentation (VSS) is a computationally expensive task due to the per-frame prediction for videos of high frame rates. In recent work, compact models or adaptive network strategies have been proposed for efficient VSS. However, they di

Externí odkaz: http://arxiv.org/abs/2303.07224

Zobrazit plný text záznamu

Report

Reversible Vision Transformers

Autor: Mangalam, Karttikeya, Fan, Haoqi, Li, Yanghao, Wu, Chao-Yuan, Xiong, Bo, Feichtenhofer, Christoph, Malik, Jitendra

We present Reversible Vision Transformers, a memory efficient architecture design for visual recognition. By decoupling the GPU memory requirement from the depth of the model, Reversible Vision Transformers enable scaling up architectures with effici

Externí odkaz: http://arxiv.org/abs/2302.04869

Zobrazit plný text záznamu

Report

MAViL: Masked Audio-Video Learners

Autor: Huang, Po-Yao, Sharma, Vasu, Xu, Hu, Ryali, Chaitanya, Fan, Haoqi, Li, Yanghao, Li, Shang-Wen, Ghosh, Gargi, Malik, Jitendra, Feichtenhofer, Christoph

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach learns with three complementary forms of self-supervision: (1) reconstruction of masked audio and video input data, (2) intra- and inter-modal contrast

Externí odkaz: http://arxiv.org/abs/2212.08071

Zobrazit plný text záznamu

Report

Scaling Language-Image Pre-training via Masking

Autor: Li, Yanghao, Fan, Haoqi, Hu, Ronghang, Feichtenhofer, Christoph, He, Kaiming

We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP. Our method randomly masks out and removes a large portion of image patches during training. Masking allows us to learn from more image-text pair

Externí odkaz: http://arxiv.org/abs/2212.00794

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání