Výsledky vyhledávání

Report

Fine-grained Controllable Video Generation via Object Appearance and Context

Autor: Huang, Hsin-Ping, Su, Yu-Chuan, Sun, Deqing, Jiang, Lu, Jia, Xuhui, Zhu, Yukun, Yang, Ming-Hsuan

Text-to-video generation has shown promising results. However, by taking only natural languages as input, users often face difficulties in providing detailed information to precisely control the model's output. In this work, we propose fine-grained c

Externí odkaz: http://arxiv.org/abs/2312.02919

Zobrazit plný text záznamu

Report

Video Summarization: Towards Entity-Aware Captions

Autor: Ayyubi, Hammad A., Liu, Tianqi, Nagrani, Arsha, Lin, Xudong, Zhang, Mingda, Arnab, Anurag, Han, Feng, Zhu, Yukun, Liu, Jialu, Chang, Shih-Fu

Existing popular video captioning benchmarks and models deal with generic captions devoid of specific person, place or organization named entities. In contrast, news videos present a challenging setting where the caption requires such named entities

Externí odkaz: http://arxiv.org/abs/2312.02188

Zobrazit plný text záznamu

Report

Superpixel Transformers for Efficient Semantic Segmentation

Autor: Zhu, Alex Zihao, Mei, Jieru, Qiao, Siyuan, Yan, Hang, Zhu, Yukun, Chen, Liang-Chieh, Kretzschmar, Henrik

Semantic segmentation, which aims to classify every pixel in an image, is a key task in machine perception, with many applications across robotics and autonomous driving. Due to the high dimensionality of this task, most existing approaches use local

Externí odkaz: http://arxiv.org/abs/2309.16889

Zobrazit plný text záznamu

Report

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

Autor: Yang, Chenglin, Qiao, Siyuan, Yu, Qihang, Yuan, Xiaoding, Zhu, Yukun, Yuille, Alan, Adam, Hartwig, Chen, Liang-Chieh

This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. Unlike the current works that stack separate mobile convolution and transformer blocks, we effectively merge

Externí odkaz: http://arxiv.org/abs/2210.01820

Zobrazit plný text záznamu

Report

kMaX-DeepLab: k-means Mask Transformer

Autor: Yu, Qihang, Wang, Huiyu, Qiao, Siyuan, Collins, Maxwell, Zhu, Yukun, Adam, Hartwig, Yuille, Alan, Chen, Liang-Chieh

The rise of transformers in vision tasks not only advances network backbone designs, but also starts a brand-new page to achieve end-to-end image recognition (e.g., object detection and panoptic segmentation). Originated from Natural Language Process

Externí odkaz: http://arxiv.org/abs/2207.04044

Zobrazit plný text záznamu

Report

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Autor: Yu, Qihang, Wang, Huiyu, Kim, Dahun, Qiao, Siyuan, Collins, Maxwell, Zhu, Yukun, Adam, Hartwig, Yuille, Alan, Chen, Liang-Chieh

We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering. It rethinks the existing transformer architectures used in segmentation and detection; CMT-DeepLab considers the

Externí odkaz: http://arxiv.org/abs/2206.08948

Zobrazit plný text záznamu

Report

Waymo Open Dataset: Panoramic Video Panoptic Segmentation

Autor: Mei, Jieru, Zhu, Alex Zihao, Yan, Xinchen, Yan, Hang, Qiao, Siyuan, Zhu, Yukun, Chen, Liang-Chieh, Kretzschmar, Henrik, Anguelov, Dragomir

Panoptic image segmentation is the computer vision task of finding groups of pixels in an image and assigning semantic classes and object instance identifiers to them. Research in image segmentation has become increasingly popular due to its critical

Externí odkaz: http://arxiv.org/abs/2206.07704

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání