Comparison of na�ve Bayes and logistic regression for computer-aided diagnosis of breast masses using ultrasound imaging

Autor: Cary, Theodore W., Cwanger, Alyssa, Venkatesh, Santosh S., Conant, Emily F., Sehgal, Chandra M.
Zdroj: Proceedings of SPIE; February 2012, Vol. 8320 Issue: 1 p83200M-83200M-7, 748808p
Abstrakt: This study compares the performance of two proven but very different machine learners, Na�ve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Na�ve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Na�ve Bayes showed significant variation (Az 0.733 ± 0.035 to 0.840 ± 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 ± 0.029 to 0.859 ± 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Na�ve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Databáze: Supplemental Index