Rastro-DM: data mining with a trail
Autor: | de Castro, Marcus Vinicius Borela, Balaniuk, Remis |
---|---|
Rok vydání: | 2024 |
Předmět: | |
Zdroj: | Revista do TCU (Brazilian Federal Court of Accounts), 145 (2021): 79-106 |
Druh dokumentu: | Working Paper |
Popis: | This paper proposes a methodology for documenting data mining (DM) projects, Rastro-DM (Trail Data Mining), with a focus not on the model that is generated, but on the processes behind its construction, in order to leave a trail (Rastro in Portuguese) of planned actions, training completed, results obtained, and lessons learned. The proposed practices are complementary to structuring methodologies of DM, such as CRISP-DM, which establish a methodological and paradigmatic framework for the DM process. The application of best practices and their benefits is illustrated in a project called 'Cladop' that was created for the classification of PDF documents associated with the investigative process of damages to the Brazilian Federal Public Treasury. Building the Rastro-DM kit in the context of a project is a small step that can lead to an institutional leap to be achieved by sharing and using the trail across the enterprise. Comment: It was published in the Brazilian Federal Court of Accounts Journal n. 145 on 2021 (https://revista.tcu.gov.br/ojs/index.php/RTCU/article/view/1733) |
Databáze: | arXiv |
Externí odkaz: |