Výsledky vyhledávání

Report

LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression

Autor: Chen, Jieneng, Ye, Luoxin, He, Ju, Wang, Zhao-Yang, Khashabi, Daniel, Yuille, Alan

While significant advancements have been made in compressed representations for text embeddings in large language models (LLMs), the compression of visual tokens in large multi-modal models (LMMs) has remained a largely overlooked area. In this work,

Externí odkaz: http://arxiv.org/abs/2406.20092

Zobrazit plný text záznamu

Report

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Autor: Liu, Qihao, Zeng, Zhanpeng, He, Ju, Yu, Qihang, Shen, Xiaohui, Chen, Liang-Chieh

This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.

Externí odkaz: http://arxiv.org/abs/2406.09416

Zobrazit plný text záznamu

Report

A Simple Video Segmenter by Tracking Objects Along Axial Trajectories

Autor: He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh

Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges

Externí odkaz: http://arxiv.org/abs/2311.18537

Zobrazit plný text záznamu

Report

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Autor: Yu, Qihang, He, Ju, Deng, Xueqing, Shen, Xiaohui, Chen, Liang-Chieh

Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a

Externí odkaz: http://arxiv.org/abs/2308.02487

Zobrazit plný text záznamu

Report

Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation

Autor: He, Ju, Chen, Jieneng, Lin, Ming-Xian, Yu, Qihang, Yuille, Alan

In this work, we present a robust approach for joint part and object segmentation. Specifically, we reformulate object and part segmentation as an optimization problem and build a hierarchical feature representation including pixel, part, and object-

Externí odkaz: http://arxiv.org/abs/2306.07404

Zobrazit plný text záznamu

Report

PartImageNet: A Large, High-Quality Dataset of Parts

Autor: He, Ju, Yang, Shuo, Yang, Shaokang, Kortylewski, Adam, Yuan, Xiaoding, Chen, Jie-Neng, Liu, Shuai, Yang, Cheng, Yu, Qihang, Yuille, Alan

It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-bas

Externí odkaz: http://arxiv.org/abs/2112.00933

Zobrazit plný text záznamu

Report

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images

Autor: Zhao, Bingchen, Yu, Shaozuo, Ma, Wufei, Yu, Mingxin, Mei, Shenxiao, Wang, Angtian, He, Ju, Yuille, Alan, Kortylewski, Adam

Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introd

Externí odkaz: http://arxiv.org/abs/2111.14341

Zobrazit plný text záznamu

Report

Learning from Temporal Gradient for Semi-supervised Action Recognition

Autor: Xiao, Junfei, Jing, Longlong, Zhang, Lin, He, Ju, She, Qi, Zhou, Zongwei, Yuille, Alan, Li, Yingwei

Semi-supervised video action recognition tends to enable deep neural networks to achieve remarkable performance even with very limited labeled data. However, existing methods are mainly transferred from current image-based methods (e.g., FixMatch). W

Externí odkaz: http://arxiv.org/abs/2111.13241

Zobrazit plný text záznamu

Report

TransMix: Attend to Mix for Vision Transformers

Autor: Chen, Jie-Neng, Sun, Shuyang, He, Ju, Torr, Philip, Yuille, Alan, Bai, Song

Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that th

Externí odkaz: http://arxiv.org/abs/2111.09833

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání