An Improved Data Discretization Algorithm based on Rough Sets Theory

Autor: Shu Yan, Liu Han, Miaoqiong Wang, Kai Wei, Jiang Chunyu
Rok vydání: 2020
Předmět:
Zdroj: ISPA/BDCloud/SocialCom/SustainCom
Popis: As the key to big data, extracting the potential and core value from a large volume of data is very important. In this study, we propose a discrete data processing algorithm, known as the improved class-attribute contingency coefficient (ICACC) algorithm, based on the rough sets theory and class attribute strain coefficient standard. The proposed ICACC can reduce the data distortion rate effectively by selecting correct discrete points as well as adding the constraint of data inconsistency rate. By testing with the C4.5 algorithm, the proposed ICACC represents a nearly 10 % improvement in terms of the recognition rate and the calculation accuracy, compared with traditional algorithms.
Databáze: OpenAIRE