CNN-based Server State Monitoring and Fault Diagnosis using Infrared Thermal Images

Autor: Beltus Wiysobunri Nkwawir, Hamza Salih Erden, Behcet Ugur Toreyin
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-1211668/v1
Popis: The recent spike in the demand for high-performance computing (HPC) server systems has created many challenges in data centers (DCs) including thermal management, system reliability sustenance and server failure minimalization. Lately, deep neural networks applied to infrared thermography (IRT) images have been successfully used for fault diagnosis in several fields. This paper evaluates seven state-of-the-art deep pretrained convolutional neural network (CNN)-based architectures and two shallow CNN-based architectures applied on server surface IRT images for the automatic diagnosis of five server operation conditions: partial CPU load; maximum CPU load; main fan failure; CPU fan failure; and server entrance blockage. Our approach is based on the concept of transfer learning which involves two main stages. First, a CNN model classifier pretrained on the large ImageNet dataset is used to extract lower level features. Second, the IRT images are used to finetune the higher levels of the CNN model classifier. A stratified five-fold cross-validation resampling method is used to evaluate the effectiveness and generalization of the nine architectures for five dataset split ratios. Results suggest that the CNN architectures achieve high prediction performance accuracies, with the majority having above 98% test accuracies across multiple split ratios. In addition, our diagnostic results are significantly higher than those obtained using a traditional support vector machine classifier trained on handcrafted features. The effectiveness and robustness of the CNN-based algorithms can provide
Databáze: OpenAIRE