Výsledky vyhledávání

Report

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai

Externí odkaz: http://arxiv.org/abs/2406.08418

Zobrazit plný text záznamu

Report

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (

Externí odkaz: http://arxiv.org/abs/2404.16821

Zobrazit plný text záznamu

Report

Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework

Autor: Li, Junxian, Shi, Bin, Cui, Erfei, Wei, Hua, Zheng, Qinghua

We study the challenging problem for inference tasks on large-scale graph datasets of Graph Neural Networks: huge time and memory consumption, and try to overcome it by reducing reliance on graph structure. Even though distilling graph knowledge to s

Externí odkaz: http://arxiv.org/abs/2403.01079

Zobrazit plný text záznamu

Report

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Autor: Liu, Zhaoyang, Lai, Zeqiang, Gao, Zhangwei, Cui, Erfei, Li, Ziheng, Zhu, Xizhou, Lu, Lewei, Chen, Qifeng, Qiao, Yu, Dai, Jifeng, Wang, Wenhai

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguou

Externí odkaz: http://arxiv.org/abs/2310.17796

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání