Machine learning discrimination of Gleason scores below GG3 and above GG4 for HSPC patients diagnosis

Autor:	Bingyu Zhu, Longguo Dai, Huijian Wang, Kun Zhang, Chongjian Zhang, Yang Wang, Feiyu Yin, Ji Li, Enfa Ning, Qilin Wang, Libo Yang, Hong Yang, Ruiqian Li, Jun Li, Chen Hu, Hongyi Wu, Haiyang Jiang, Yu Bai
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Machine learning Gleason score Diagnostic model Medicine Science
Zdroj:	Scientific Reports, Vol 14, Iss 1, Pp 1-19 (2024)
Druh dokumentu:	article
ISSN:	2045-2322
DOI:	10.1038/s41598-024-77033-1
Popis:	Abstract This study aims to develop machine learning (ML)-assisted models for analyzing datasets related to Gleason scores in prostate cancer, conducting statistical analyses on the datasets, and identifying meaningful features. We retrospectively collected data from 717 hormone-sensitive prostate cancer (HSPC) patients at Yunnan Cancer Hospital. Of these, data from 526 patients were used for modeling. Seven auxiliary models were established using Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Extreme gradient boosting tree (XGBoost), Adaptive Boosting (Adaboost), and artificial neural network (ANN) based on 21 clinical biochemical indicators and features. Evaluation metrics included accuracy (ACC), precision (PRE), specificity (SPE), sensitivity (SEN) or regression rate(Recall), and f1 score. Evaluation metrics for the models primarily included ACC, PRE, SPE, SEN or Recall, f1 score, and area under the curve(AUC). Evaluation metrics were visualized using confusion matrices and ROC curves. Among the ensemble learning methods, RF, XGBoost, and Adaboost performed the best. RF achieved a training dataset score of 0.769 (95% CI: 0.759—0.835) and a testing dataset score of 0.755 (95% CI: 0.660—0.760) (AUC: 0.786, 95%CI: 0.722—0.803), while XGBoost achieved a training dataset score of 0.755 (95% CI: 95%CI: 0.711—0.809) and a testing dataset score of 0.745 (95% CI: 0.660—0.764) (AUC: 0.777, 95% CI: 0.726—0.798). Adaboost scored 0.789 on the training dataset (95% CI: 0.782—0.857) and 0.774 on the testing dataset (95% CI: 0.651—0.774) (AUC: 0.799, 95% CI: 0.703—0.802). In terms of feature importance (FI) in ensemble learning, Bone metastases at first visit, prostatic volume, age, and T1-T2 have significant proportions in RF’s FI. fPSA, TPSA, and tumor burden have significant proportions in Adaboost’s FI, while f/TPSA, LDH, and testosterone have the highest proportions in XGBoost. Our findings indicate that ensemble learning methods demonstrate good performance in classifying HSPC patient data, with TNM staging and fPSA being important classification indicators. These discoveries provide valuable references for distinguishing different Gleason scores, facilitating more accurate patient assessments and personalized treatment plans.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/f834aacd378242aea95d39f1ae2b4974 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.