Abstrakt: |
The goal of the research was to demonstrate the full data science lifecycle through a use case of the MobileNetv2 model for vehicle image Classification task using various validation and test sets, each with different difficulty level. Diverse model variations were employed, each designed to recognize images of ground vehicles and classify them into one of five possible classes: car, truck, motorcycle, bicycle, or bus. In terms of validation accuracy, the highest results were obtained by the model trained with uniformly designed train and val sets (with data normalization and augmentation), where train set also contained validation set. This model also obtained the highest accuracy results on both test sets. The superiority of MODEL 3 BASELINE is confirmed by other metrics as well: test loss, f1-score, AUC and confusion matrices (for both test sets). Results between MODEL 1 BASELINE and MODEL 2 BASELINE differed according to the test set 1 and 2 and other metrics and it was not possible to declare the superiority of one method of datasets preparation over another (original class distribution [no data normalization and no data augmentation] versus uniformly designed [with data normalization and augmentation]). The article also presents challenges and findings - the problems, key issues, and their solutions that arose during the process of data collection and tagging, as well as the preparation and evaluation of the model. [ABSTRACT FROM AUTHOR] |