Indexing Portuguese NLP Resources with PT-Pump-Up

Autor: Almeida, Rúben, Campos, Ricardo, Jorge, Alípio, Nunes, Sérgio
Rok vydání: 2024
Předmět:
Zdroj: PROPOR 2024
Druh dokumentu: Working Paper
Popis: The recent advances in natural language processing (NLP) are linked to training processes that require vast amounts of corpora. Access to this data is commonly not a trivial process due to resource dispersion and the need to maintain these infrastructures online and up-to-date. New developments in NLP are often compromised due to the scarcity of data or lack of a shared repository that works as an entry point to the community. This is especially true in low and mid-resource languages, such as Portuguese, which lack data and proper resource management infrastructures. In this work, we propose PT-Pump-Up, a set of tools that aim to reduce resource dispersion and improve the accessibility to Portuguese NLP resources. Our proposal is divided into four software components: a) a web platform to list the available resources; b) a client-side Python package to simplify the loading of Portuguese NLP resources; c) an administrative Python package to manage the platform and d) a public GitHub repository to foster future collaboration and contributions. All four components are accessible using: https://linktr.ee/pt_pump_up
Comment: Demo Track, 3 pages
Databáze: arXiv