Efficient attribute reduction from the viewpoint of discernibility
Autor: | Afeng Yang, Mi He, Min Lu, Jun Zhang, Shuhua Teng, Yongjian Nian |
---|---|
Rok vydání: | 2016 |
Předmět: |
Information Systems and Management
business.industry Heuristic computer.software_genre Machine learning Computer Science Applications Theoretical Computer Science Set (abstract data type) Reduction (complexity) Artificial Intelligence Control and Systems Engineering Pattern recognition (psychology) Preprocessor Attribute domain Data mining Rough set Artificial intelligence business Decision table computer Software Mathematics |
Zdroj: | Information Sciences. 326:297-314 |
ISSN: | 0020-0255 |
Popis: | Attribute reduction is an important preprocessing step in pattern recognition, machine learning and data mining. As an effective method for attribute reduction, rough set theory offers a useful and formal methodology. It retains the discernibility power of the original datasets; thus, attribute reduction has been extensively studied in rough set theory. However, the inefficiency of the existing attribute reduction algorithms limits the application of rough sets. In this paper, we first analyse the limitations of existing attribute reduction algorithms. Then, a novel measure of attribute quality, called the relative discernibility degree, is proposed based on the discernibility. Theoretical analysis shows that this measure can find relative dispensable attributes and remain unchanged after removing the relative dispensable attributes and redundant objects in the process of selecting attributes. This property can be used to reduce the search space and accelerate the heuristic process of attribute reduction. Consequently, a new attribute reduction algorithm is proposed from the viewpoint of discernibility. Furthermore, the relationships among the reduction definitions of the algebra view, information view and discernibility view are derived. Some non-equivalent relationships among these views of rough set theory in inconsistent decision tables are discovered. A set of numerical experiments was conducted on UCI datasets. Experimental results show that the proposed algorithm is effective and efficient and is applicable to the case of large-scale datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |