Zobrazeno 1 - 10
of 299
pro vyhledávání: '"Dong, Haoyu"'
We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's
Externí odkaz:
http://arxiv.org/abs/2408.11490
Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. The recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM t
Externí odkaz:
http://arxiv.org/abs/2408.00756
Autor:
Tian, Yuzhang, Zhao, Jianbo, Dong, Haoyu, Xiong, Junyu, Xia, Shiyu, Zhou, Mengyu, Lin, Yun, Cambronero, José, He, Yeye, Han, Shi, Zhang, Dongmei
Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method
Externí odkaz:
http://arxiv.org/abs/2407.09025
Autor:
Li, Binxu, Yan, Tiankai, Pan, Yuanting, Xu, Zhe, Luo, Jie, Ji, Ruiyang, Liu, Shilong, Dong, Haoyu, Lin, Zihao, Wang, Yixin
Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropri
Externí odkaz:
http://arxiv.org/abs/2407.02483
Autor:
Xia, Shiyu, Xiong, Junyu, Dong, Haoyu, Zhao, Jianbo, Tian, Yuzhang, Zhou, Mengyu, He, Yeye, Han, Shi, Zhang, Dongmei
Publikováno v:
Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), Pages 116-128, August 2024
This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatia
Externí odkaz:
http://arxiv.org/abs/2405.16234
Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to a
Externí odkaz:
http://arxiv.org/abs/2405.08099
Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, th
Externí odkaz:
http://arxiv.org/abs/2404.09957
Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other han
Externí odkaz:
http://arxiv.org/abs/2404.07318
Autor:
Dong, Haoyu, Tran, Tram Thi Minh, Verstegen, Rutger, Cazacu, Silvia, Gao, Ruolin, Hoggenmüller, Marius, Dey, Debargha, Franssen, Mervyn, Sasalovici, Markus, Bazilinskyy, Pavlo, Martens, Marieke
Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospec
Externí odkaz:
http://arxiv.org/abs/2403.19153
Autor:
Dong, Haoyu, Tran, Tram Thi Minh, Bazilinskyy, Pavlo, Hoggenmüller, Marius, Dey, Debargha, Cazacu, Silvia, Franssen, Mervyn, Gao, Ruolin
As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic
Externí odkaz:
http://arxiv.org/abs/2403.11386