Zobrazeno 1 - 10
of 146
pro vyhledávání: '"Huang, Yilun"'
High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning
Externí odkaz:
http://arxiv.org/abs/2408.04594
The emergence of large-scale multi-modal generative models has drastically advanced artificial intelligence, introducing unprecedented levels of performance and functionality. However, optimizing these models remains challenging due to historically i
Externí odkaz:
http://arxiv.org/abs/2407.11784
Autor:
Qin, Zhen, Chen, Daoyuan, Zhang, Wenhao, Yao, Liuyi, Huang, Yilun, Ding, Bolin, Li, Yaliang, Deng, Shuiguang
The rapid development of large language models (LLMs) has been witnessed in recent years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from text to a broader spectrum of domains, attracting widespread attention due to the
Externí odkaz:
http://arxiv.org/abs/2407.08583
Despite the impressive capabilities of Multimodal Large Language Models (MLLMs) in integrating text and image modalities, challenges remain in accurately interpreting detailed visual elements. This paper presents an empirical study on enhancing MLLMs
Externí odkaz:
http://arxiv.org/abs/2401.17981
Autor:
Chen, Daoyuan, Huang, Yilun, Ma, Zhijian, Chen, Hesen, Pan, Xuchen, Ge, Ce, Gao, Dawei, Xie, Yuexiang, Liu, Zhaoyang, Gao, Jinyang, Li, Yaliang, Ding, Bolin, Zhou, Jingren
The immense evolution in Large Language Models (LLMs) has underscored the importance of massive, heterogeneous, and high-quality data. A data recipe is a mixture of data from different sources for training LLMs, which plays a vital role in LLMs' perf
Externí odkaz:
http://arxiv.org/abs/2309.02033
The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in various vision tasks, overshadowing the conventional CNN-based models. This ignites a few recent striking-back research in the CNN world showing that pure CNN
Externí odkaz:
http://arxiv.org/abs/2303.02165
Autor:
Zhang, Qi, Yang, Zijian, Huang, Yilun, Chen, Ze, Cai, Zijian, Wang, Kangxu, Zheng, Jiewen, He, Jiarong, Gao, Jin
In this paper, we present our solution to the Multilingual Information Retrieval Across a Continuum of Languages (MIRACL) challenge of WSDM CUP 2023\footnote{https://project-miracl.github.io/}. Our solution focuses on enhancing the ranking stage, whe
Externí odkaz:
http://arxiv.org/abs/2302.07010
In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architectur
Externí odkaz:
http://arxiv.org/abs/2211.15444
Autor:
Zhang, Qi, Yang, Zijian, Huang, Yilun, Chen, Ze, Cai, Zijian, Wang, Kangxu, Zheng, Jiewen, He, Jiarong, Gao, Jin
This paper mainly describes our winning solution (team name: www) to Amazon ESCI Challenge of KDD CUP 2022, which achieves a NDCG score of 0.9043 and wins the first place on task 1: the query-product ranking track. In this competition, participants a
Externí odkaz:
http://arxiv.org/abs/2208.02958
The Cross-Market Recommendation task of WSDM CUP 2022 is about finding solutions to improve individual recommendation systems in resource-scarce target markets by leveraging data from similar high-resource source markets. Finally, our team OPDAI won
Externí odkaz:
http://arxiv.org/abs/2203.00897