Zobrazeno 1 - 10
of 4 125
pro vyhledávání: '"Xu,Ran"'
Autor:
Zhang, Jieyu, Xue, Le, Song, Linxin, Wang, Jun, Huang, Weikai, Shu, Manli, Yan, An, Ma, Zixian, Niebles, Juan Carlos, savarese, silvio, Xiong, Caiming, Chen, Zeyuan, Krishna, Ranjay, Xu, Ran
With the rise of multimodal applications, instruction data has become critical for training multimodal language models capable of understanding complex image-based queries. Existing practices rely on powerful but costly large language models (LLMs) o
Externí odkaz:
http://arxiv.org/abs/2412.07012
Autor:
Du, Yuzhen, Hu, Teng, Zhang, Jiangning, Xu, Ran Yi Chengming, Hu, Xiaobin, Wu, Kai, Luo, Donghao, Wang, Yabiao, Ma, Lizhuang
Image restoration (IR) aims to recover high-quality images from degraded inputs, with recent deep learning advancements significantly enhancing performance. However, existing methods lack a unified training benchmark for iterations and configurations
Externí odkaz:
http://arxiv.org/abs/2412.03814
Autor:
Awadalla, Anas, Xue, Le, Shu, Manli, Yan, An, Wang, Jun, Purushwalkam, Senthil, Shen, Sheng, Lee, Hannah, Lo, Oscar, Park, Jae Sung, Guha, Etash, Savarese, Silvio, Schmidt, Ludwig, Choi, Yejin, Xiong, Caiming, Xu, Ran
We introduce BLIP3-KALE, a dataset of 218 million image-text pairs that bridges the gap between descriptive synthetic captions and factual web-scale alt-text. KALE augments synthetic dense image captions with web-scale alt-text to generate factually
Externí odkaz:
http://arxiv.org/abs/2411.07461
Autor:
Xu, Ran, Liu, Hui, Nag, Sreyashi, Dai, Zhenwei, Xie, Yaochen, Tang, Xianfeng, Luo, Chen, Li, Yang, Ho, Joyce C., Yang, Carl, He, Qi
Retrieval-augmented generation (RAG) enhances the question-answering (QA) abilities of large language models (LLMs) by integrating external knowledge. However, adapting general-purpose RAG systems to specialized fields such as science and medicine po
Externí odkaz:
http://arxiv.org/abs/2410.17952
Autor:
Ryoo, Michael S., Zhou, Honglu, Kendre, Shrikant, Qin, Can, Xue, Le, Shu, Manli, Savarese, Silvio, Xu, Ran, Xiong, Caiming, Niebles, Juan Carlos
We present xGen-MM-Vid (BLIP-3-Video): a multimodal language model for videos, particularly designed to efficiently capture temporal information over multiple frames. BLIP-3-Video takes advantage of the 'temporal encoder' in addition to the conventio
Externí odkaz:
http://arxiv.org/abs/2410.16267
Vision-Language Models (VLMs) often generate plausible but incorrect responses to visual queries. However, reliably quantifying the effect of such hallucinations in free-form responses to open-ended queries is challenging as it requires visually veri
Externí odkaz:
http://arxiv.org/abs/2410.13121
Autor:
Zhang, Jianguo, Lan, Tian, Zhu, Ming, Liu, Zuxin, Hoang, Thai, Kokane, Shirley, Yao, Weiran, Tan, Juntao, Prabhakar, Akshara, Chen, Haolin, Liu, Zhiwei, Feng, Yihao, Awalgaonkar, Tulika, Murthy, Rithesh, Hu, Eric, Chen, Zeyuan, Xu, Ran, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Savarese, Silvio, Xiong, Caiming
Autonomous agents powered by large language models (LLMs) have attracted significant research interest. However, the open-source community faces many challenges in developing specialized models for agent tasks, driven by the scarcity of high-quality
Externí odkaz:
http://arxiv.org/abs/2409.03215
Autor:
Qin, Can, Xia, Congying, Ramakrishnan, Krithika, Ryoo, Michael, Tu, Lifu, Feng, Yihao, Shu, Manli, Zhou, Honglu, Awadalla, Anas, Wang, Jun, Purushwalkam, Senthil, Xue, Le, Zhou, Yingbo, Wang, Huan, Savarese, Silvio, Niebles, Juan Carlos, Chen, Zeyuan, Xu, Ran, Xiong, Caiming
We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and i
Externí odkaz:
http://arxiv.org/abs/2408.12590
Autor:
Xue, Le, Shu, Manli, Awadalla, Anas, Wang, Jun, Yan, An, Purushwalkam, Senthil, Zhou, Honglu, Prabhu, Viraj, Dai, Yutong, Ryoo, Michael S, Kendre, Shrikant, Zhang, Jieyu, Qin, Can, Zhang, Shu, Chen, Chia-Chih, Yu, Ning, Tan, Juntao, Awalgaonkar, Tulika Manoj, Heinecke, Shelby, Wang, Huan, Choi, Yejin, Schmidt, Ludwig, Chen, Zeyuan, Savarese, Silvio, Niebles, Juan Carlos, Xiong, Caiming, Xu, Ran
This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated datasets, a training recipe, model architectures, and a resulting suite of LMMs. xGen-MM, s
Externí odkaz:
http://arxiv.org/abs/2408.08872
Autor:
Shen, Jiaming, Xu, Ran, Jun, Yennie, Qin, Zhen, Liu, Tianqi, Yang, Carl, Liang, Yi, Baumgartner, Simon, Bendersky, Michael
Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. They are trained using preference datasets where each example consists of one input prompt, two responses, and a preference label. As curating a high-qu
Externí odkaz:
http://arxiv.org/abs/2407.16008