Popis: |
Machine learning models are used across domains for decision making these days. While the most important aspect of any model is its accuracy, the training time is also a crucial aspect in many applications. Some of the most powerful classification algorithms like Support Vector Machine (SVM) or Multilayer perceptron (MLP) are often computationally intensive and consume huge amounts of memory and time to train. The issues are further compounded by the large datasets that are common today. In this paper, we use a hybrid approach where clustering algorithms are used to reduce the training dataset to any size, and then run complex algorithms for classification on this tractable yet informative reduced set. The benefits offered are twofold-in reducing memory requirements and speed up training duration. We test this algorithm on four datasets and compare the performances of clustering algorithms like K-means, Mini batch K-means, BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) and Hashing techniques with SVM and MLP. We note that this hybrid approach offers a considerable speedup in training times without any heavy loss of accuracy. |