Machine Learning Using Template-Based-Predicted Structure of Haemagglutinin Predicts Pathogenicity of Avian Influenza.

Autor: Shin JH; Department of Microbiology, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea., Kim SJ; Department of Microbiology, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea., Kim G; Department of Biomedical Sciences; BK21 FOUR Biomedical Science Project; Medical Research Institute, Seoul National University College of Medicine, Seoul 03080, Republic of Korea., Kim HR; Department of Biomedical Sciences; Department of Anatomy & Cell Biology; BK21 FOUR Biomedical Science Project; Medical Research Institute, Seoul National University College of Medicine, Seoul 03080, Republic of Korea., Ko KS; Department of Microbiology, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
Jazyk: angličtina
Zdroj: Journal of microbiology and biotechnology [J Microbiol Biotechnol] 2024 Oct 28; Vol. 34 (10), pp. 2033-2040. Date of Electronic Publication: 2024 Aug 06.
DOI: 10.4014/jmb.2405.05022
Abstrakt: Deep learning presents a promising approach to complex biological classifications, contingent upon the availability of well-curated datasets. This study addresses the challenge of analyzing three-dimensional protein structures by introducing a novel pipeline that utilizes open-source tools to convert protein structures into a format amenable to computational analysis. Applying a two-dimensional convolutional neural network (CNN) to a dataset of 12,143 avian influenza virus genomes from 64 countries, encompassing 119 hemagglutinin (HA) and neuraminidase (NA) types, we achieved significant classification accuracy. The pathogenicity was determined based on the presence of H5 or H7 subtypes, and our models, ranging from zero to six mid-layers, indicated that a four-layer model most effectively identified highly pathogenic strains, with accuracies over 0.9. To enhance our approach, we incorporated Principal Component Analysis (PCA) for dimensionality reduction and one-class SVM for abnormality detection, improving model robustness through bootstrapping. Furthermore, the K-nearest neighbor (K-NN) algorithm was fine-tuned via hyperparameter optimization to corroborate the findings. The PCA identified distinct clustering for pathogenic HA, yielding an AUC of up to 0.85. The optimized K-NN model demonstrated an impressive accuracy between 0.96 and 0.97. These combined methodologies underscore our deep learning framework's capacity for rapid and precise identification of pathogenic avian influenza strains, thus providing a critical tool for managing global avian influenza threats.
Databáze: MEDLINE