Výsledky vyhledávání - "Visual Question Answering"

Akademický článek

Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering

Autor: Zhongjian Hu, Peng Yang, Fengyuan Liu, Yuan Meng, Xingyu Liu

Publikováno v: Big Data Mining and Analytics, Vol 7, Iss 3, Pp 843-857 (2024)

Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge gr

Externí odkaz: https://doaj.org/article/4a5c02e765b940a88f6ed65e4d31f4d3

Zobrazit plný text záznamu

Akademický článek

MKEAH： Multimodal knowledge extraction and accumulation based on hyperplane embedding for knowledge-based visual question answering

Autor: Heng Zhang, Zhihua Wei, Guanming Liu, Rui Wang, Ruibin Mu, Chuanbao Liu, Aiquan Yuan, Guodong Cao, Ning Hu

Publikováno v: Virtual Reality & Intelligent Hardware, Vol 6, Iss 4, Pp 280-291 (2024)

Background: External knowledge representations play an essential role in knowledge-based visual question and answering to better understand complex scenarios in the open world. Recent entity-relationship embedding approaches are deficient in represen

Externí odkaz: https://doaj.org/article/1c531a501ec4475e8e8cb3b13222674e

Zobrazit plný text záznamu

Akademický článek

Vision transformer-based visual language understanding of the construction process

Autor: Bin Yang, Binghan Zhang, Yilong Han, Boda Liu, Jiniming Hu, Yiming Jin

Publikováno v: Alexandria Engineering Journal, Vol 99, Iss , Pp 242-256 (2024)

The widespread implementation of surveillance systems on construction sites has led to the accumulation of vast amounts of visual data, highlighting the need for an effective semantic analysis methodology. Natural language, as the most intuitive mode

Externí odkaz: https://doaj.org/article/220ef99f59d34df7a9a00fd63a1e841a

Zobrazit plný text záznamu

Akademický článek

Dual modality prompt learning for visual question-grounded answering in robotic surgery

Autor: Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei

Publikováno v: Visual Computing for Industry, Biomedicine, and Art, Vol 7, Iss 1, Pp 1-13 (2024)

Abstract With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content wi

Externí odkaz: https://doaj.org/article/aacb44afa9224e1da20c704418d07e75

Zobrazit plný text záznamu

Akademický článek

Visual Question Answering Bahasa Indonesia Berbasis Deep Learning untuk Pembelajaran Visual Anak TK

Autor: Asiyah Hanifah, Rizka Wakhidatus Sholikah, R.V. Hari Ginardi

Publikováno v: Techno.Com, Vol 23, Iss 1, Pp 136-148 (2024)

Indonesia semakin gencar melakukan persiapan transformasi digital dalam berbagai sektor, termasuk dalam bidang pendidikan. Salah satu upaya yang dilakukan pemerintah adalah dengan mengimplementasikan platform e-learning dalam kegiatan belajar mengaja

Externí odkaz: https://doaj.org/article/cfcdecc4c35b4d8789989c38408784cb

Zobrazit plný text záznamu

Akademický článek

PERS: Parameter-Efficient Multimodal Transfer Learning for Remote Sensing Visual Question Answering

Autor: Jinlong He, Gang Liu, Pengfei Li, Xiaonan Su, Wenhua Jiang, Dongze Zhang, Shenjun Zhong

Publikováno v: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 17, Pp 14823-14835 (2024)

Remote sensing (RS) visual question answering (VQA) provides accurate answers through the analysis of RS images (RSIs) and associated questions. Recent research has increasingly adopted transformers for feature extraction. However, this trend leads t

Externí odkaz: https://doaj.org/article/9b2880f8a088446b81905f73107a8f63

Zobrazit plný text záznamu

Akademický článek

Toward Unsupervised Visual Reasoning: Do Off-the-Shelf Features Know How to Reason?

Autor: Monika Wysoczanska, Tom Monnier, Tomasz Trzcinski, David Picard

Publikováno v: IEEE Access, Vol 12, Pp 76367-76378 (2024)

Recent advances in visual representation learning allowed for the construction of a plethora of powerful features that are ready to use for numerous downstream tasks. Contrary to existing representation evaluations typically based on image or pixel-w

Externí odkaz: https://doaj.org/article/297e4dc099694f6285cd3ebc5051520e

Zobrazit plný text záznamu

Akademický článek

ConfigILM: A general purpose configurable library for combining image and language models for visual question answering

Autor: Leonard Hackel, Kai Norman Clasen, Begüm Demir

Publikováno v: SoftwareX, Vol 26, Iss , Pp 101731- (2024)

ConfigILM is an open-source Python library for rapid iterative development of image-language models for visual question answering in PyTorch. It provides a convenient implementation for seamlessly combining image and language models from two popular

Externí odkaz: https://doaj.org/article/b331d46c8b80461e845b76125c5673f8

Zobrazit plný text záznamu

Akademický článek

Survey of Multimodal Medical Question Answering

Autor: Hilmi Demirhan, Wlodek Zadrozny

Publikováno v: BioMedInformatics, Vol 4, Iss 1, Pp 50-74 (2023)

Multimodal medical question answering (MMQA) is a vital area bridging healthcare and Artificial Intelligence (AI). This survey methodically examines the MMQA research published in recent years. We collect academic literature through Google Scholar, a

Externí odkaz: https://doaj.org/article/c4308bd755224cb8889226ff316dee55

Zobrazit plný text záznamu

Plný text ve formátu HTML

Akademický článek

A multi-scale contextual attention network for remote sensing visual question answering

Autor: Jiangfan Feng, Hui Wang

Publikováno v: International Journal of Applied Earth Observations and Geoinformation, Vol 126, Iss , Pp 103641- (2024)

Remote sensing visual question answering (RSVQA) is a user-friendly method used for analyzing remote sensing images (RSIs) in various tasks. However, current methods often overlook geospatial objects, which possess a multi-scale representation and re

Externí odkaz: https://doaj.org/article/ce6b3aef0b2941349cc55c9abaf68a3d

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání