Empirical Study of Feature Selection Methods in Regression for Large-Scale Healthcare Data: A Case Study on Estimating Dental Expenditures

Autor:	Veena Mayya, Christian King, Giang T. Vu, Varadraj Gurupur
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Dental care clinical decision support systems dental visits feature selection machine learning Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 153564-153579 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3482192
Popis:	The complexity and high dimensionality of healthcare data present substantial challenges in building machine learning (ML) models, given the large number of variables such as patient demographics and medical history. Effective feature selection is crucial to address issues such as increased computational resource, longer training times, overfitting, and reduced model interpretability, etc. This study evaluates a range of feature selection methods to identify the most impactful features for predicting dental expenditures using publicly available Medical Expenditure Panel Survey (MEPS) data. Sixteen ML models are assessed to determine the top performing model, after which state-of-the-art filter, wrapper, embedded, and hybrid feature selection techniques are applied to optimize the feature set. The highest performance, in terms of coefficient of determination ( $R^{2}$ ), is achieved using a hybrid feature selection method that combines the mutual information filter with the embedded features from the CatBoost regressor. The results indicate that the proposed system is suitable for real-time deployment even with reduced features, providing potential benefits such as minimizing the need for irrelevant and difficult-to-obtain features. Moreover, automated feature selection significantly enhances model performance, yielding a $R^{2}$ score of 0.86, compared to the score 0.73 achieved with carefully selected manual features. Additionally, to enhance the interpretability of the top-performing ML model, explanatory visualizations are employed to examine the influence of key features on predicting dental expenditures.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/500680133f6e4736a7c38b33a921d7c0 Zobrazit plný text záznamu View record in DOAJ