Zobrazeno 1 - 10
of 40
pro vyhledávání: '"Mei, Jieru"'
This paper studies the vulnerabilities of transformer-based Large Language Models (LLMs) to jailbreaking attacks, focusing specifically on the optimization-based Greedy Coordinate Gradient (GCG) strategy. We first observe a positive correlation betwe
Externí odkaz:
http://arxiv.org/abs/2410.09040
In this paper, we introduce a hierarchical transformer-based model designed for sophisticated image segmentation tasks, effectively bridging the granularity of part segmentation with the comprehensive scope of object segmentation. At the heart of our
Externí odkaz:
http://arxiv.org/abs/2409.01353
Autor:
Li, Xianhang, Tu, Haoqin, Hui, Mude, Wang, Zeyu, Zhao, Bingchen, Xiao, Junfei, Ren, Sucheng, Mei, Jieru, Liu, Qing, Zheng, Huangjie, Zhou, Yuyin, Xie, Cihang
Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text
Externí odkaz:
http://arxiv.org/abs/2406.08478
Autor:
Ren, Sucheng, Li, Xianhang, Tu, Haoqin, Wang, Feng, Shu, Fangxun, Zhang, Lei, Mei, Jieru, Yang, Linjie, Wang, Peng, Wang, Heng, Yuille, Alan, Xie, Cihang
The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining
Externí odkaz:
http://arxiv.org/abs/2406.07537
Autor:
Ren, Sucheng, Huang, Xiaoke, Li, Xianhang, Xiao, Junfei, Mei, Jieru, Wang, Zeyu, Yuille, Alan, Zhou, Yuyin
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image gener
Externí odkaz:
http://arxiv.org/abs/2406.05565
Autor:
Wang, Feng, Wang, Jiahao, Ren, Sucheng, Wei, Guoyizhe, Mei, Jieru, Shao, Wei, Zhou, Yuyin, Yuille, Alan, Xie, Cihang
Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe i
Externí odkaz:
http://arxiv.org/abs/2405.14858
Segmenting brain tumors is complex due to their diverse appearances and scales. Brain metastases, the most common type of brain tumor, are a frequent complication of cancer. Therefore, an effective segmentation model for brain metastases must adeptly
Externí odkaz:
http://arxiv.org/abs/2403.15735
In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt
Externí odkaz:
http://arxiv.org/abs/2401.02931
Autor:
Xiao, Junfei, Zhou, Ziqi, Li, Wenxuan, Lan, Shiyi, Mei, Jieru, Yu, Zhiding, Yuille, Alan, Zhou, Yuyin, Xie, Cihang
This paper introduces ProLab, a novel approach using property-level label space for creating strong interpretable segmentation models. Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common s
Externí odkaz:
http://arxiv.org/abs/2312.13764
This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text understanding t
Externí odkaz:
http://arxiv.org/abs/2312.11420