Computational Method for Classification of Avian Influenza A Virus Using DNA Sequence Information and Physicochemical Properties

Autor:	Dong-Qing Wei, Abbas Khan, Nasim Fawad, Arif Ali, Ali Farhan, Sahar Fazal, Fahad Humayun, Fatima Khan, Shazia Shamas
Rok vydání:	2020
Předmět:	0301 basic medicine lcsh:QH426-470 K-nearest neighbor Computer science Feature extraction 0211 other engineering and technologies Decision tree Feature selection 02 engineering and technology k-nearest neighbors algorithm 03 medical and health sciences Naive Bayes classifier decision tree Genetics support vector machine k-gram Genetics (clinical) Original Research discrete wavelet transform 021103 operations research multivariate mutual information business.industry Decision tree learning Pattern recognition Naïve Bayes Support vector machine lcsh:Genetics 030104 developmental biology Avian influenza A Virus Molecular Medicine Artificial intelligence business F1 score
Zdroj:	Frontiers in Genetics Frontiers in Genetics, Vol 12 (2021)
ISSN:	1664-8021
Popis:	Accurate and fast characterization of the subtype sequences of Avian influenza A virus (AIAV) hemagglutinin (HA) and neuraminidase (NA) depends on expanding diagnostic services and is embedded in molecular epidemiological studies. A new approach for classifying the AIAV sequences of the HA and NA genes into subtypes using DNA sequence data and physicochemical properties is proposed. This method simply requires unaligned, full-length, or partial sequences of HA or NA DNA as input. It allows for quick and highly accurate assignments of HA sequences to subtypes H1–H16 and NA sequences to subtypes N1–N9. For feature extraction, k-gram, discrete wavelet transformation, and multivariate mutual information were used, and different classifiers were trained for prediction. Four different classifiers, Naïve Bayes, Support Vector Machine (SVM), K nearest neighbor (KNN), and Decision Tree, were compared using our feature selection method. This comparison is based on the 30% dataset separated from the original dataset for testing purposes. Among the four classifiers, Decision Tree was the best, and Precision, Recall, F1 score, and Accuracy were 0.9514, 0.9535, 0.9524, and 0.9571, respectively. Decision Tree had considerable improvements over the other three classifiers using our method. Results show that the proposed feature selection method, when trained with a Decision Tree classifier, gives the best results for accurate prediction of the AIAV subtype.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bb48b6cb8b880a08ae8826cf4e7bb2c1 https://pubmed.ncbi.nlm.nih.gov/33584824 Zobrazit plný text záznamu