Zobrazeno 1 - 10
of 70
pro vyhledávání: '"Staar, Peter"'
Autor:
Auer, Christoph, Lysak, Maksym, Nassar, Ahmed, Dolfi, Michele, Livathinos, Nikolaos, Vagenas, Panos, Ramis, Cesar Berrospi, Omenetti, Matteo, Lindlbauer, Fabian, Dinkla, Kasper, Mishra, Lokesh, Kim, Yusik, Gupta, Shubham, de Lima, Rafael Teixeira, Weber, Valery, Morin, Lucas, Meijer, Ingmar, Kuropiatnyk, Viktor, Staar, Peter W. J.
This technical report introduces Docling, an easy to use, self-contained, MIT-licensed open-source package for PDF document conversion. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recogn
Externí odkaz:
http://arxiv.org/abs/2408.09869
Autor:
Mishra, Lokesh, Dhibi, Sohayl, Kim, Yusik, Ramis, Cesar Berrospi, Gupta, Shubham, Dolfi, Michele, Staar, Peter
Publikováno v:
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024), pages 193-214, Bangkok, Thailand. Association for Computational Linguistics
Environment, Social, and Governance (ESG) KPIs assess an organization's performance on issues such as climate change, greenhouse gas emissions, water consumption, waste management, human rights, diversity, and policies. ESG reports convey this valuab
Externí odkaz:
http://arxiv.org/abs/2406.19102
Autor:
Bhattacharjee, Bishwaranjan, Trivedi, Aashka, Muraoka, Masayasu, Ramasubramanian, Muthukumaran, Udagawa, Takuma, Gurung, Iksha, Zhang, Rong, Dandala, Bharath, Ramachandran, Rahul, Maskey, Manil, Bugbee, Kaylin, Little, Mike, Fancher, Elizabeth, Sanders, Lauren, Costes, Sylvain, Blanco-Cuaresma, Sergi, Lockhart, Kelly, Allen, Thomas, Grezes, Felix, Ansdell, Megan, Accomazzi, Alberto, El-Kurdi, Yousef, Wertheimer, Davis, Pfitzmann, Birgit, Ramis, Cesar Berrospi, Dolfi, Michele, de Lima, Rafael Teixeira, Vagenas, Panagiotis, Mukkavilli, S. Karthik, Staar, Peter, Vahidinia, Sanaz, McGranaghan, Ryan, Mehrabian, Armin, Lee, Tsendgar
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks
Externí odkaz:
http://arxiv.org/abs/2405.10725
Autor:
Naparstek, Oshri, Pony, Roi, Shapira, Inbar, Dahood, Foad Abo, Azulai, Ophir, Yaroker, Yevgeny, Rubinstein, Nadav, Lysak, Maksym, Staar, Peter, Nassar, Ahmed, Livathinos, Nikolaos, Auer, Christoph, Amrani, Elad, Friedman, Idan, Prince, Orit, Burshtein, Yevgeny, Goldfarb, Adi Raz, Barzelay, Udi
In recent years, the challenge of extracting information from business documents has emerged as a critical task, finding applications across numerous domains. This effort has attracted substantial interest from both industry and academy, highlighting
Externí odkaz:
http://arxiv.org/abs/2405.00505
Autor:
Mishra, Lokesh, Berrospi, Cesar, Dinkla, Kasper, Antognini, Diego, Fusco, Francesco, Bothur, Benedikt, Lysak, Maksym, Livathinos, Nikolaos, Nassar, Ahmed, Vagenas, Panagiotis, Morin, Lucas, Auer, Christoph, Dolfi, Michele, Staar, Peter
Publikováno v:
AAAI 2024, 38, 23814-23816
We present Deep Search DocQA. This application enables information extraction from documents via a question-answering conversational assistant. The system integrates several technologies from different AI disciplines consisting of document conversion
Externí odkaz:
http://arxiv.org/abs/2311.18481
Autor:
Morin, Lucas, Danelljan, Martin, Agea, Maria Isabel, Nassar, Ahmed, Weber, Valery, Meijer, Ingmar, Staar, Peter, Yu, Fisher
The automatic analysis of chemical literature has immense potential to accelerate the discovery of new materials and drugs. Much of the critical information in patent documents and scientific articles is contained in figures, depicting the molecule s
Externí odkaz:
http://arxiv.org/abs/2308.12234
Autor:
Auer, Christoph, Nassar, Ahmed, Lysak, Maksym, Dolfi, Michele, Livathinos, Nikolaos, Staar, Peter
Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and content from PDF files or scanned material has remained a key proble
Externí odkaz:
http://arxiv.org/abs/2305.14962
Extracting tables from documents is a crucial task in any document conversion pipeline. Recently, transformer-based models have demonstrated that table-structure can be recognized with impressive accuracy using Image-to-Markup-Sequence (Im2Seq) appro
Externí odkaz:
http://arxiv.org/abs/2305.03393
Term extraction is an information extraction task at the root of knowledge discovery platforms. Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for dom
Externí odkaz:
http://arxiv.org/abs/2210.13118
Autor:
Naparstek, Oshri, Azulai, Ophir, Rotman, Daniel, Burshtein, Yevgeny, Staar, Peter, Barzelay, Udi
For digitizing or indexing physical documents, Optical Character Recognition (OCR), the process of extracting textual information from scanned documents, is a vital technology. When a document is visually damaged or contains non-textual elements, exi
Externí odkaz:
http://arxiv.org/abs/2207.01220