Abstrakt: |
The extraction and classification of data from the Moroccan National Tax Appeals Commission are complex and non-existent in the Moroccan legal and tax domain (NTAC). Rulings data extraction relies too heavily on manual labour, is inefficient, time-consuming, and prone to mistakes. Tools for automating the tax rulings task have been suggested to assist the tax appeals decisions (TAD); however, applying a generic natural language processing model to domain-specific items and lacking training text data present difficulties. In this paper, we developed a text extraction system to boost productivity, creating a database for analysis and prediction. Our study aims to automate data extraction and classification using REGEX and the BERT algorithm. Among 562 rulings (1999-2018) on tax irregularities, we extracted 201 corporate tax-related decisions and 550 disputes on corporate tax headings. Our model achieved strong results, with a precision of 99.1% and an accuracy of 98.6%. |