Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Qiu, Longtian"'
Autor:
Gao, Peng, Zhuo, Le, Liu, Dongyang, Du, Ruoyi, Luo, Xu, Qiu, Longtian, Zhang, Yuhang, Lin, Chen, Huang, Rongjie, Geng, Shijie, Zhang, Renrui, Xi, Junlin, Shao, Wenqi, Jiang, Zhengkai, Yang, Tianshuo, Ye, Weicai, Tong, He, He, Jingwen, Qiao, Yu, Li, Hongsheng
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we int
Externí odkaz:
http://arxiv.org/abs/2405.05945
Autor:
Liu, Dongyang, Zhang, Renrui, Qiu, Longtian, Huang, Siyuan, Lin, Weifeng, Zhao, Shitian, Geng, Shijie, Lin, Ziyi, Jin, Peng, Zhang, Kaipeng, Shao, Wenqi, Xu, Chao, He, Conghui, He, Junjun, Shao, Hao, Lu, Pan, Li, Hongsheng, Qiao, Yu, Gao, Peng
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series developed upon SPHINX. To improve the architecture and training efficiency, we modify the SPHINX framework by removing redundant visual encoders, bypassing fully-padde
Externí odkaz:
http://arxiv.org/abs/2402.05935
Image captioning aims at generating descriptive and meaningful textual descriptions of images, enabling a broad range of vision-language applications. Prior works have demonstrated that harnessing the power of Contrastive Image Language Pre-training
Externí odkaz:
http://arxiv.org/abs/2401.02347
Autor:
Fu, Chaoyou, Zhang, Renrui, Wang, Zihan, Huang, Yubo, Zhang, Zhengye, Qiu, Longtian, Ye, Gaoxiang, Shen, Yunhang, Zhang, Mengdan, Chen, Peixian, Zhao, Sirui, Lin, Shaohui, Jiang, Deqiang, Yin, Di, Gao, Peng, Li, Ke, Li, Hongsheng, Sun, Xing
The surge of interest towards Multi-modal Large Language Models (MLLMs), e.g., GPT-4V(ision) from OpenAI, has marked a significant trend in both academia and industry. They endow Large Language Models (LLMs) with powerful capabilities in visual under
Externí odkaz:
http://arxiv.org/abs/2312.12436
Autor:
Lin, Ziyi, Liu, Chris, Zhang, Renrui, Gao, Peng, Qiu, Longtian, Xiao, Han, Qiu, Han, Lin, Chen, Shao, Wenqi, Chen, Keqin, Han, Jiaming, Huang, Siyuan, Zhang, Yichi, He, Xuming, Li, Hongsheng, Qiao, Yu
We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings. First, for stronger vision-language alignment, we unfreeze the large language model (LLM) during pre-tra
Externí odkaz:
http://arxiv.org/abs/2311.07575
Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via kno
Externí odkaz:
http://arxiv.org/abs/2303.15786
Masked Autoencoders (MAE) have shown promising performance in self-supervised learning for both 2D and 3D computer vision. However, existing MAE-style methods can only learn from the data of a single modality, i.e., either images or point clouds, whi
Externí odkaz:
http://arxiv.org/abs/2302.14007
Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification. To further improve its downstream performance, existing works pr
Externí odkaz:
http://arxiv.org/abs/2209.14169
Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning. However, due to the semantic gap within datasets, CLIP's pre-trained image-text alignment becomes sub-optimal
Externí odkaz:
http://arxiv.org/abs/2112.02399