Zobrazeno 1 - 10
of 4 577
pro vyhledávání: '"A. Vivoli"'
Although foundational vision-language models (VLMs) have proven to be very successful for various semantic discrimination tasks, they still struggle to perform faithfully for fine-grained categorization. Moreover, foundational models trained on one d
Externí odkaz:
http://arxiv.org/abs/2409.01835
The comic domain is rapidly advancing with the development of single-page analysis and synthesis models. However, evaluation metrics and datasets lag behind, often limited to small-scale or single-style test sets. We introduce a novel benchmark, CoMi
Externí odkaz:
http://arxiv.org/abs/2407.03550
Autor:
Vivoli, Emanuele, Campaioli, Irene, Nardoni, Mariateresa, Biondi, Niccolò, Bertini, Marco, Karatzas, Dimosthenis
Comics, as a medium, uniquely combine text and images in styles often distinct from real-world visuals. For the past three decades, computational research on comics has evolved from basic object detection to more sophisticated tasks. However, the fie
Externí odkaz:
http://arxiv.org/abs/2407.03540
This work explores a closure task in comics, a medium where visual and textual elements are intricately intertwined. Specifically, Text-cloze refers to the task of selecting the correct text to use in a comic panel, given its neighboring panels. Trad
Externí odkaz:
http://arxiv.org/abs/2403.03719
Publikováno v:
Il Foro Italiano, 1998 Sep 01. 121(9), 2375/2376-2379/2380.
Externí odkaz:
https://www.jstor.org/stable/23194279
Holographic imaging is a technique that uses microwave energy to create a three-dimensional image of an object or scene. This technology has potential applications in land mine detection, as the long-wavelength microwave energy can penetrate the grou
Externí odkaz:
http://arxiv.org/abs/2303.15335
Relevant information in documents is often summarized in tables, helping the reader to identify useful facts. Most benchmark datasets support either document layout analysis or table understanding, but lack in providing data to apply both tasks in a
Externí odkaz:
http://arxiv.org/abs/2302.01451
Publikováno v:
Il Foro Italiano, 1905 Jan 01. 30, 177/178-179/180.
Externí odkaz:
https://www.jstor.org/stable/23107486
In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question
Externí odkaz:
http://arxiv.org/abs/2209.06730
Tables are widely used in several types of documents since they can bring important information in a structured way. In scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easi
Externí odkaz:
http://arxiv.org/abs/2208.11203