Výsledky vyhledávání - "Staar, Peter"

Report

Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs

Autor: Mishra, Lokesh, Dhibi, Sohayl, Kim, Yusik, Ramis, Cesar Berrospi, Gupta, Shubham, Dolfi, Michele, Staar, Peter

Publikováno v: Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), pages 193-214, Bangkok, Thailand. Association for Computational Linguistics

Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuab

Externí odkaz: http://arxiv.org/abs/2406.19102

Zobrazit plný text záznamu

Report

INDUS: Effective and Efficient Language Models for Scientific Applications

Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks

Externí odkaz: http://arxiv.org/abs/2405.10725

Zobrazit plný text záznamu

Report

KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

Autor: Naparstek, Oshri, Pony, Roi, Shapira, Inbar, Dahood, Foad Abo, Azulai, Ophir, Yaroker, Yevgeny, Rubinstein, Nadav, Lysak, Maksym, Staar, Peter, Nassar, Ahmed, Livathinos, Nikolaos, Auer, Christoph, Amrani, Elad, Friedman, Idan, Prince, Orit, Burshtein, Yevgeny, Goldfarb, Adi Raz, Barzelay, Udi

In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains. This effort has attracted substantial interest from both industry and academy, highlighting

Externí odkaz: http://arxiv.org/abs/2405.00505

Zobrazit plný text záznamu

Report

ESG Accountability Made Easy: DocQA at Your Service

Autor: Mishra, Lokesh, Berrospi, Cesar, Dinkla, Kasper, Antognini, Diego, Fusco, Francesco, Bothur, Benedikt, Lysak, Maksym, Livathinos, Nikolaos, Nassar, Ahmed, Vagenas, Panagiotis, Morin, Lucas, Auer, Christoph, Dolfi, Michele, Staar, Peter

Publikováno v: AAAI 2024, 38, 23814-23816

We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion

Externí odkaz: http://arxiv.org/abs/2311.18481

Zobrazit plný text záznamu

Report

MolGrapher: Graph-based Visual Recognition of Chemical Structures

Autor: Morin, Lucas, Danelljan, Martin, Agea, Maria Isabel, Nassar, Ahmed, Weber, Valery, Meijer, Ingmar, Staar, Peter, Yu, Fisher

The automatic analysis of chemical literature has immense potential to accelerate the discovery of new materials and drugs. Much of the critical information in patent documents and scientific articles is contained in figures, depicting the molecule s

Externí odkaz: http://arxiv.org/abs/2308.12234

Zobrazit plný text záznamu

Report

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

Autor: Auer, Christoph, Nassar, Ahmed, Lysak, Maksym, Dolfi, Michele, Livathinos, Nikolaos, Staar, Peter

Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and content from PDF files or scanned material has remained a key proble

Externí odkaz: http://arxiv.org/abs/2305.14962

Zobrazit plný text záznamu

Report

Optimized Table Tokenization for Table Structure Recognition

Autor: Lysak, Maksym, Nassar, Ahmed, Livathinos, Nikolaos, Auer, Christoph, Staar, Peter

Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) appro

Externí odkaz: http://arxiv.org/abs/2305.03393

Zobrazit plný text záznamu

Report

Unsupervised Term Extraction for Highly Technical Domains

Autor: Fusco, Francesco, Staar, Peter, Antognini, Diego

Term extraction is an information extraction task at the root of knowledge discovery platforms. Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for dom

Externí odkaz: http://arxiv.org/abs/2210.13118

Zobrazit plný text záznamu

Report

BusiNet -- a Light and Fast Text Detection Network for Business Documents

Autor: Naparstek, Oshri, Azulai, Ophir, Rotman, Daniel, Burshtein, Yevgeny, Staar, Peter, Barzelay, Udi

For digitizing or indexing physical documents, Optical Character Recognition (OCR), the process of extracting textual information from scanned documents, is a vital technology. When a document is visually damaged or contains non-textual elements, exi

Externí odkaz: http://arxiv.org/abs/2207.01220

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání