Zobrazeno 1 - 10
of 327
pro vyhledávání: '"Song, JiaJun"'
Autor:
Fu, Ling, Yang, Biao, Kuang, Zhebin, Song, Jiajun, Li, Yuzhe, Zhu, Linghao, Luo, Qidi, Wang, Xinyu, Lu, Hao, Huang, Mingxin, Li, Zhang, Tang, Guozhi, Shan, Bin, Lin, Chunhui, Liu, Qi, Wu, Binghong, Feng, Hao, Liu, Hao, Huang, Can, Tang, Jingqun, Chen, Wei, Jin, Lianwen, Liu, Yuliang, Bai, Xiang
Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilit
Externí odkaz:
http://arxiv.org/abs/2501.00321
Autor:
Zhang, Chi, Song, Jiajun, Li, Siyu, Liang, Yitao, Ma, Yuxi, Wang, Wei, Zhu, Yixin, Zhu, Song-Chun
Mathematics olympiads are prestigious competitions, with problem proposing and solving highly honored. Building artificial intelligence that proposes and solves olympiads presents an unresolved challenge in automated theorem discovery and proving, es
Externí odkaz:
http://arxiv.org/abs/2412.10673
Autor:
Zhou, Pengfei, Peng, Xiaopeng, Song, Jiajun, Li, Chuanhao, Xu, Zhaopan, Yang, Yue, Guo, Ziyao, Zhang, Hao, Lin, Yuqi, He, Yefei, Zhao, Lirui, Liu, Shuo, Li, Tianhua, Xie, Yuxuan, Chang, Xiaojun, Qiao, Yu, Shao, Wenqi, Zhang, Kaipeng
Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding and generation tasks. However, generating interleaved image-text content remains a challenge, which requires integrated multimodal understanding and genera
Externí odkaz:
http://arxiv.org/abs/2411.18499
Mixture-of-experts-based (MoE-based) diffusion models have shown their scalability and ability to generate high-quality images, making them a promising choice for efficient model scaling. However, they rely on expert parallelism across GPUs, necessit
Externí odkaz:
http://arxiv.org/abs/2411.16786
Large language models (LLMs) such as GPT-4 sometimes appear to be creative, solving novel tasks often with a few demonstrations in the prompt. These tasks require the models to generalize on distributions different from those from training data -- wh
Externí odkaz:
http://arxiv.org/abs/2408.09503
Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) fo
Externí odkaz:
http://arxiv.org/abs/2407.05125
Autor:
Maryam, Hiba, Fu, Ling, Song, Jiajun, Shafayet, Tajrian ABM, Luo, Qidi, Bai, Xiang, Liu, Yuliang
The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding
Externí odkaz:
http://arxiv.org/abs/2405.12533
Food computing brings various perspectives to computer vision like vision-based food analysis for nutrition and health. As a fundamental task in food computing, food detection needs Zero-Shot Detection (ZSD) on novel unseen food objects to support re
Externí odkaz:
http://arxiv.org/abs/2402.09242
Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize nov
Externí odkaz:
http://arxiv.org/abs/2310.04689
Autor:
Song, Jiajun, Zhong, Yiqiao
Transformers are widely used to extract semantic meanings from input tokens, yet they usually operate as black-box models. In this paper, we present a simple yet informative decomposition of hidden states (or embeddings) of trained transformers into
Externí odkaz:
http://arxiv.org/abs/2310.04861