Zobrazeno 1 - 10
of 123
pro vyhledávání: '"Meng, Zaiqiao"'
Visual Language Models (VLMs) are essential for various tasks, particularly visual reasoning tasks, due to their robust multi-modal information integration, visual reasoning capabilities, and contextual awareness. However, existing \VLMs{}' visual sp
Externí odkaz:
http://arxiv.org/abs/2407.14133
The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We
Externí odkaz:
http://arxiv.org/abs/2406.17095
Retrieval-augmented generation (RAG) offers an effective approach for addressing question answering (QA) tasks. However, the imperfections of the retrievers in RAG models often result in the retrieval of irrelevant information, which could introduce
Externí odkaz:
http://arxiv.org/abs/2406.11460
Predicting drug-target interaction (DTI) is critical in the drug discovery process. Despite remarkable advances in recent DTI models through the integration of representations from diverse drug and target encoders, such models often struggle to captu
Externí odkaz:
http://arxiv.org/abs/2406.01651
Deep quantization methods have shown high efficiency on large-scale image retrieval. However, current models heavily rely on ground-truth information, hindering the application of quantization in label-hungry scenarios. A more realistic demand is to
Externí odkaz:
http://arxiv.org/abs/2404.04998
Medical open-domain question answering demands substantial access to specialized knowledge. Recent efforts have sought to decouple knowledge from model parameters, counteracting architectural scaling and allowing for training on common low-resource h
Externí odkaz:
http://arxiv.org/abs/2403.01924
Autor:
Long, Zijun, Killick, George, Zhuang, Lipeng, Aragon-Camarasa, Gerardo, Meng, Zaiqiao, Mccreadie, Richard
State-of-the-art pre-trained image models predominantly adopt a two-stage approach: initial unsupervised pre-training on large-scale datasets followed by task-specific fine-tuning using Cross-Entropy loss~(CE). However, it has been demonstrated that
Externí odkaz:
http://arxiv.org/abs/2402.14551
Transformer-based Large Language Models (LLMs) are pioneering advances in many natural language processing tasks, however, their exceptional capabilities are restricted within the preset context window of Transformer. Position Embedding (PE) scaling
Externí odkaz:
http://arxiv.org/abs/2310.16450
Key information extraction (KIE) from scanned documents has gained increasing attention because of its applications in various domains. Although promising results have been achieved by some recent KIE approaches, they are usually built based on discr
Externí odkaz:
http://arxiv.org/abs/2310.16131
Large language models exhibit enhanced zero-shot performance on various tasks when fine-tuned with instruction-following data. Multimodal instruction-following models extend these capabilities by integrating both text and images. However, existing mo
Externí odkaz:
http://arxiv.org/abs/2308.16463