Conquering fashion MNIST with CNNs using computer vision by pretrained models: VGG19 and RESNET50.

Autor: Venkataravanappa, Viswanatha, Chowdappa, Ramachandra Ankathattahalli, Shamanna, Madhukara, Krishnappa, Manjula, Mariyappa, Bavitesh, Singh, Abhishek Kumar
Předmět:
Zdroj: AIP Conference Proceedings; 2024, Vol. 3131 Issue 1, p1-12, 12p
Abstrakt: This paper delves into the training and testing of two pre-trained models of a Convolutional Neural Network (CNN) to classify images of clothing from the Fashion MNIST dataset and determine the classification accuracy and performance of both the models. The Fashion MNIST dataset is a collection of 60,000 28x28 grayscale images of 10 different types of clothing. The images are well-labeled and comparatively easy to classify, making them a good starting point for learning about CNNs. The article starts with a summary of the Fashion MNIST dataset. There are 60,000 training photos and ten thousand test ones in the dataset. T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag and Ankle boot are the pictures' ten constituent classes. The CNN architecture used in the tests is described in the next part of the study. CNN architecture is made up of two convolutional layers, two max-pooling layers, and two fully linked layers. The convolutional layers employ 32 3x3 filters, whereas the max pooling layers employ a pool size of 2x2. There are 128 neurons in the fully linked layer and 10 neurons in the unconnected layer. The Adam optimizer is chosen to train the CNN framework utilizing the learning rate of 1e-5. The machine learning algorithm underwent training for 30 epochs with a batch size of 64. The experiments' shortcomings are highlighted in the conclusion section. It is certainly possible that the CNN model would not perform as well on other datasets. Another limitation is that the experiments were conducted only for a set of hyperparameters. The outcomes of the experiments are presented in the results section. The training accuracy of the Convolutional Neural Network models VGG19 and Resnet50 are 94.58 and 99.41 respectively. The testing accuracies for VGG19 and Resnet50 models are 91.92 and 90.25 respectively. The inference latencies for both cases were found to be 5.439 and 3.522 respectively. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index