A new preprocessing method reduces the dimensionality of classification models

Autor: Fatima El Barakaz, Abdelmajid El Moutaouakkil, Omar Boutkhoum
Rok vydání: 2019
Předmět:
Zdroj: Proceedings of the 4th International Conference on Big Data and Internet of Things.
DOI: 10.1145/3372938.3373005
Popis: In data mining classification problems, the higher the number of features, the harder the visualization of the training set is. Working on a dataset with varied factors sounds a big problem for data analyst even after applying classification models, especially when we could not identify the features that have generated such classification. Sometimes, most of these features are correlated, and hence redundant. Dimensionality reduction is the process of reducing the number of random variables under consideration, by obtaining a set of principal variables. An online marketplace could have many sources of revenues, particularly, and as amazon does, there is a strategy consist of allocating shops for uses and granting them a minimum number of visitors. Using automated classification models to target those professional uses is be very advantageous. With a minimum of charges mainly from the data storage side. In data mining, supervised classification is a technique of assigning an instance to predefined classes. Many classification algorithms could take important execution time once the data is potential and huge. This sample document is provided a new preprocessing method that based on cleaning the insignificant attributes before classification, this the idea of this approach follows mainly two parts, the first part is mainly based on normalization data, and the second part focus on cleaning the insignificant attributes before classification. We have applied this method to three different supervised classifiers: SVM, Naive Bayes, and decision tree. The objective is to achieve the same classification performances for each classifier with fewer dimensions and less execution time, which reduce the stored data for the Marketplace. The result of the proposed method was very satisfactory.
Databáze: OpenAIRE