Zobrazeno 1 - 10
of 2 314
pro vyhledávání: '"LI, Yanwei"'
Autor:
Sun, Haoze, Li, Wenbo, Liu, Jiayue, Zhou, Kaiwen, Chen, Yongqiang, Guo, Yong, Li, Yanwei, Pei, Renjing, Peng, Long, Yang, Yujiu
Generalization has long been a central challenge in real-world image restoration. While recent diffusion-based restoration methods, which leverage generative priors from text-to-image models, have made progress in recovering more realistic details, t
Externí odkaz:
http://arxiv.org/abs/2412.00878
Autor:
Li, Bo, Zhang, Yuanhan, Guo, Dong, Zhang, Renrui, Li, Feng, Zhang, Hao, Zhang, Kaichen, Zhang, Peiyuan, Li, Yanwei, Liu, Ziwei, Li, Chunyuan
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision
Externí odkaz:
http://arxiv.org/abs/2408.03326
Autor:
Fu, Chaoyou, Dai, Yuhan, Luo, Yongdong, Li, Lei, Ren, Shuhuai, Zhang, Renrui, Wang, Zihan, Zhou, Chenyu, Shen, Yunhang, Zhang, Mengdan, Chen, Peixian, Li, Yanwei, Lin, Shaohui, Zhao, Sirui, Li, Ke, Xu, Tong, Zheng, Xiawu, Chen, Enhong, Ji, Rongrong, Sun, Xing
In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. T
Externí odkaz:
http://arxiv.org/abs/2405.21075
Our paper is devoted to the study of Peng's stochastic maximum principle (SMP) for a stochastic control problem composed of a controlled forward stochastic differential equation (SDE) as dynamics and a controlled backward SDE which defines the cost f
Externí odkaz:
http://arxiv.org/abs/2404.06826
Autor:
Li, Yanwei, Zhang, Yuechen, Wang, Chengyao, Zhong, Zhisheng, Chen, Yixin, Chu, Ruihang, Liu, Shaoteng, Jia, Jiaya
In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to a
Externí odkaz:
http://arxiv.org/abs/2403.18814
Autor:
Liu, Shaoteng, Yuan, Haoqi, Hu, Minda, Li, Yanwei, Chen, Yukang, Liu, Shu, Lu, Zongqing, Jia, Jiaya
Large Language Models (LLMs) have demonstrated proficiency in utilizing various tools by coding, yet they face limitations in handling intricate logic and precise control. In embodied tasks, high-level planning is amenable to direct coding, while low
Externí odkaz:
http://arxiv.org/abs/2402.19299
Although perception systems have made remarkable advancements in recent years, they still rely on explicit human instruction or pre-defined categories to identify the target objects before executing visual recognition tasks. Such systems cannot activ
Externí odkaz:
http://arxiv.org/abs/2308.00692
Autor:
Deng, Ruining, Li, Yanwei, Li, Peize, Wang, Jiacheng, Remedios, Lucas W., Agzamkhodjaev, Saydolimkhon, Asad, Zuhayr, Liu, Quan, Cui, Can, Wang, Yaohong, Wang, Yihan, Tang, Yucheng, Yang, Haichun, Huo, Yuankai
Multi-class cell segmentation in high-resolution Giga-pixel whole slide images (WSI) is critical for various clinical applications. Training such an AI model typically requires labor-intensive pixel-wise manual annotation from experienced domain expe
Externí odkaz:
http://arxiv.org/abs/2306.00047
This paper aims to efficiently enable Large Language Models (LLMs) to use multimodal tools. Advanced proprietary LLMs, such as ChatGPT and GPT-4, have shown great potential for tool usage through sophisticated prompt engineering. Nevertheless, these
Externí odkaz:
http://arxiv.org/abs/2305.18752