Výsledky vyhledávání

Report

DocTabQA: Answering Questions from Long Documents Using Tables

Autor: Wang, Haochen, Hu, Kai, Dong, Haoyu, Gao, Liangcai

We study a new problem setting of question answering (QA), referred to as DocTabQA. Within this setting, given a long document, the goal is to respond to questions by organizing the answers into structured tables derived directly from the document's

Externí odkaz: http://arxiv.org/abs/2408.11490

Zobrazit plný text záznamu

Report

Segment anything model 2: an application to 2D and 3D medical images

Autor: Dong, Haoyu, Gu, Hanxue, Chen, Yaqian, Yang, Jichen, Chen, Yuwen, Mazurowski, Maciej A.

Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. The recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM t

Externí odkaz: http://arxiv.org/abs/2408.00756

Zobrazit plný text záznamu

Report

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Autor: Tian, Yuzhang, Zhao, Jianbo, Dong, Haoyu, Xiong, Junyu, Xia, Shiyu, Zhou, Mengyu, Lin, Yun, Cambronero, José, He, Yeye, Han, Shi, Zhang, Dongmei

Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method

Externí odkaz: http://arxiv.org/abs/2407.09025

Zobrazit plný text záznamu

Report

MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

Autor: Li, Binxu, Yan, Tiankai, Pan, Yuanting, Xu, Zhe, Luo, Jie, Ji, Ruiyang, Liu, Shilong, Dong, Haoyu, Lin, Zihao, Wang, Yixin

Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropri

Externí odkaz: http://arxiv.org/abs/2407.02483

Zobrazit plný text záznamu

Report

Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

Autor: Xia, Shiyu, Xiong, Junyu, Dong, Haoyu, Zhao, Jianbo, Tian, Yuzhang, Zhou, Mengyu, He, Yeye, Han, Shi, Zhang, Dongmei

Publikováno v: Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), Pages 116-128, August 2024

This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatia

Externí odkaz: http://arxiv.org/abs/2405.16234

Zobrazit plný text záznamu

Report

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

Autor: Hu, Mengkang, Dong, Haoyu, Luo, Ping, Han, Shi, Zhang, Dongmei

Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to a

Externí odkaz: http://arxiv.org/abs/2405.08099

Zobrazit plný text záznamu

Report

How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

Autor: Gu, Hanxue, Dong, Haoyu, Yang, Jichen, Mazurowski, Maciej A.

Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, th

Externí odkaz: http://arxiv.org/abs/2404.09957

Zobrazit plný text záznamu

Report

Rethinking Perceptual Metrics for Medical Image Translation

Autor: Konz, Nicholas, Chen, Yuwen, Gu, Hanxue, Dong, Haoyu, Mazurowski, Maciej A.

Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other han

Externí odkaz: http://arxiv.org/abs/2404.07318

Zobrazit plný text záznamu

Report

Exploring Holistic HMI Design for Automated Vehicles: Insights from a Participatory Workshop to Bridge In-Vehicle and External Communication

Autor: Dong, Haoyu, Tran, Tram Thi Minh, Verstegen, Rutger, Cazacu, Silvia, Gao, Ruolin, Hoggenmüller, Marius, Dey, Debargha, Franssen, Mervyn, Sasalovici, Markus, Bazilinskyy, Pavlo, Martens, Marieke

Human-Machine Interfaces (HMIs) for automated vehicles (AVs) are typically divided into two categories: internal HMIs for interactions within the vehicle, and external HMIs for communication with other road users. In this work, we examine the prospec

Externí odkaz: http://arxiv.org/abs/2403.19153

Zobrazit plný text záznamu

Report

Holistic HMI Design for Automated Vehicles: Bridging In-Vehicle and External Communication

Autor: Dong, Haoyu, Tran, Tram Thi Minh, Bazilinskyy, Pavlo, Hoggenmüller, Marius, Dey, Debargha, Cazacu, Silvia, Franssen, Mervyn, Gao, Ruolin

As the field of automated vehicles (AVs) advances, it has become increasingly critical to develop human-machine interfaces (HMI) for both internal and external communication. Critical dialogue is emerging around the potential necessity for a holistic

Externí odkaz: http://arxiv.org/abs/2403.11386

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání