Zobrazeno 1 - 10
of 804
pro vyhledávání: '"He, Ju"'
While significant advancements have been made in compressed representations for text embeddings in large language models (LLMs), the compression of visual tokens in large multi-modal models (LMMs) has remained a largely overlooked area. In this work,
Externí odkaz:
http://arxiv.org/abs/2406.20092
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation.
Externí odkaz:
http://arxiv.org/abs/2406.09416
Autor:
He, Ju, Yu, Qihang, Shin, Inkyu, Deng, Xueqing, Yuille, Alan, Shen, Xiaohui, Chen, Liang-Chieh
Video segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to video segmentation with high-resolution input features poses significant challenges
Externí odkaz:
http://arxiv.org/abs/2311.18537
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing objects from an open set of categories. One way to address this challenge is to leverage multi-modal models, such as CLIP, to provide image and text features in a
Externí odkaz:
http://arxiv.org/abs/2308.02487
In this work, we present a robust approach for joint part and object segmentation. Specifically, we reformulate object and part segmentation as an optimization problem and build a hierarchical feature representation including pixel, part, and object-
Externí odkaz:
http://arxiv.org/abs/2306.07404
Autor:
He, Ju, Yang, Shuo, Yang, Shaokang, Kortylewski, Adam, Yuan, Xiaoding, Chen, Jie-Neng, Liu, Shuai, Yang, Cheng, Yu, Qihang, Yuille, Alan
It is natural to represent objects in terms of their parts. This has the potential to improve the performance of algorithms for object recognition and segmentation but can also help for downstream tasks like activity recognition. Research on part-bas
Externí odkaz:
http://arxiv.org/abs/2112.00933
Autor:
Zhao, Bingchen, Yu, Shaozuo, Ma, Wufei, Yu, Mingxin, Mei, Shenxiao, Wang, Angtian, He, Ju, Yuille, Alan, Kortylewski, Adam
Enhancing the robustness of vision algorithms in real-world scenarios is challenging. One reason is that existing robustness benchmarks are limited, as they either rely on synthetic data or ignore the effects of individual nuisance factors. We introd
Externí odkaz:
http://arxiv.org/abs/2111.14341
Autor:
Xiao, Junfei, Jing, Longlong, Zhang, Lin, He, Ju, She, Qi, Zhou, Zongwei, Yuille, Alan, Li, Yingwei
Semi-supervised video action recognition tends to enable deep neural networks to achieve remarkable performance even with very limited labeled data. However, existing methods are mainly transferred from current image-based methods (e.g., FixMatch). W
Externí odkaz:
http://arxiv.org/abs/2111.13241
Mixup-based augmentation has been found to be effective for generalizing models during training, especially for Vision Transformers (ViTs) since they can easily overfit. However, previous mixup-based methods have an underlying prior knowledge that th
Externí odkaz:
http://arxiv.org/abs/2111.09833
Publikováno v:
In Journal of Urban Management June 2024 13(2):217-231