Impact of ECG data format on the performance of machine learning models for the prediction of myocardial infarction.
Autor: | Bellfield RAA; Data Science Research Centre, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK; Liverpool Centre for Cardiovascular Science at University of Liverpool, Liverpool John Moores University and Liverpool Heart & Chest Hospital, Liverpool, UK., Ortega-Martorell S; Data Science Research Centre, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK; Liverpool Centre for Cardiovascular Science at University of Liverpool, Liverpool John Moores University and Liverpool Heart & Chest Hospital, Liverpool, UK., Lip GYH; Liverpool Centre for Cardiovascular Science at University of Liverpool, Liverpool John Moores University and Liverpool Heart & Chest Hospital, Liverpool, UK; Department of Clinical Medicine, Aalborg University, Denmark., Oxborough D; Liverpool Centre for Cardiovascular Science at University of Liverpool, Liverpool John Moores University and Liverpool Heart & Chest Hospital, Liverpool, UK; School of Sport and Exercise Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK., Olier I; Data Science Research Centre, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, UK; Liverpool Centre for Cardiovascular Science at University of Liverpool, Liverpool John Moores University and Liverpool Heart & Chest Hospital, Liverpool, UK. Electronic address: I.A.OlierCaparroso@ljmu.ac.uk. |
---|---|
Jazyk: | angličtina |
Zdroj: | Journal of electrocardiology [J Electrocardiol] 2024 May-Jun; Vol. 84, pp. 17-26. Date of Electronic Publication: 2024 Mar 07. |
DOI: | 10.1016/j.jelectrocard.2024.03.005 |
Abstrakt: | Background We aim to determine which electrocardiogram (ECG) data format is optimal for ML modelling, in the context of myocardial infarction prediction. We will also address the auxiliary objective of evaluating the viability of using digitised ECG signals for ML modelling. Methods Two ECG arrangements displaying 10s and 2.5 s of data for each lead were used. For each arrangement, conservative and speculative data cohorts were generated from the PTB-XL dataset. All ECGs were represented in three different data formats: Signal ECGs, Image ECGs, and Extracted Signal ECGs, with 8358 and 11,621 ECGs in the conservative and speculative cohorts, respectively. ML models were trained using the three data formats in both data cohorts. Results For ECGs that contained 10s of data, Signal and Extracted Signal ECGs were optimal and statistically similar, with AUCs [95% CI] of 0.971 [0.961, 0.981] and 0.974 [0.965, 0.984], respectively, for the conservative cohort; and 0.931 [0.918, 0.945] and 0.919 [0.903, 0.934], respectively, for the speculative cohort. For ECGs that contained 2.5 s of data, the Image ECG format was optimal, with AUCs of 0.960 [0.948, 0.973] and 0.903 [0.886, 0.920], for the conservative and speculative cohorts, respectively. Conclusion When available, the Signal ECG data should be preferred for ML modelling. If not, the optimal format depends on the data arrangement within the ECG: If the Image ECG contains 10s of data for each lead, the Extracted Signal ECG is optimal, however, if it only uses 2.5 s, then using the Image ECG data is optimal for ML performance. (Copyright © 2023. Published by Elsevier Inc.) |
Databáze: | MEDLINE |
Externí odkaz: |