Abstrakt: |
The surge in data generation has prompted a shift to big data, challenging the notion that "more data equals better performance" due to processing and time constraints. In this evolving artificial intelligence and machine learning landscape, instance selection (IS) has become crucial for data reduction without compromising model quality. Traditional IS methods, though efficient, often struggle with large, complex datasets in data mining. This study evaluates graph reduction techniques, grounded in graph theory, as a novel approach for instance selection. The objective is to leverage the inherent structures of data represented as graphs to enhance the effectiveness of instance selection. We evaluated 35 graph reduction techniques across 29 classification datasets. These techniques were assessed based on various metrics, including accuracy, F1 score, reduction rate, and computational times. Graph reduction methods showed significant potential in maintaining data integrity while achieving substantial reductions. Top techniques achieved up to 99% reduction while maintaining or improving accuracy. For instance, the Multilevel sampling achieved an accuracy effectiveness score of 0.8555 with 99.16% reduction on large datasets, while Leiden sampling showed high effectiveness on smaller datasets (0.8034 accuracy, 97.87% reduction). Computational efficiency varied widely, with reduction times ranging from milliseconds to minutes. This research advances the theory of graph-based instance selection and offers practical application guidelines. Our findings indicate graph reduction methods effectively preserve data quality and boost processing efficiency in large, complex datasets, with some techniques achieving up to 160-fold speedups in model training at high reduction rates. [ABSTRACT FROM AUTHOR] |