Popis: |
Severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) is identified as a highly transmissible coronavirus which threatens the world with this deadly pandemic. WHO reported that it spreads through contact, droplet, airborne, formite, fecal-oral, bloodborne, mother-to-child and animal-to-human. Hence, viral shedding has a huge impact on this pandemic. This study uses transcriptome data of coronavirus disease 2019 (COVID-19) patients to predict the prolonged viral shedding of the corresponding patient. This prediction starts with the transcriptome features which gives the lowest root mean squared value of 16.3±3.3 using top 25 feature selected using forward feature selection algorithm and linear regression algorithm. Then to see the impact of few non-molecular features in this prediction, they were added to the model one by one along with the selected transcriptome features. However, this study shows that those features do not have any impact on prolonged viral shedding prediction. Further this study predicts the day since onset in the same way. Here also top 25 transcriptome features selected using forward feature selection algorithm gives a comparably good accuracy (accuracy value of 0.74±0.1). However, the best accuracy was obtained using the best 20 features from feature importance using SVM (0.78±0.1). Moreover, adding non-molecular features shows a great impact on mutual information selected features in this prediction. |