Transformer-based Image Compression

Autor:	Lu, Ming, Guo, Peiyao, Shi, Huiqing, Cao, Chuntong, Ma, Zhan
Rok vydání:	2021
Předmět:	Electrical Engineering and Systems Science - Image and Video Processing Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	A Transformer-based Image Compression (TIC) approach is developed which reuses the canonical variational autoencoder (VAE) architecture with paired main and hyper encoder-decoders. Both main and hyper encoders are comprised of a sequence of neural transformation units (NTUs) to analyse and aggregate important information for more compact representation of input image, while the decoders mirror the encoder-side operations to generate pixel-domain image reconstruction from the compressed bitstream. Each NTU is consist of a Swin Transformer Block (STB) and a convolutional layer (Conv) to best embed both long-range and short-range information; In the meantime, a casual attention module (CAM) is devised for adaptive context modeling of latent features to utilize both hyper and autoregressive priors. The TIC rivals with state-of-the-art approaches including deep convolutional neural networks (CNNs) based learnt image coding (LIC) methods and handcrafted rules-based intra profile of recently-approved Versatile Video Coding (VVC) standard, and requires much less model parameters, e.g., up to 45% reduction to leading-performance LIC.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2111.06707 Zobrazit plný text záznamu View this record from Arxiv