Analysis and classification of heart diseases using heartbeat features and machine learning algorithms
Autor: | Hiam Alquran, Isam Abu-Qasmieh, Fajr Ibrahem |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Information Systems and Management
lcsh:Computer engineering. Computer hardware Heartbeat Computer Networks and Communications Computer science Decision tree lcsh:TK7885-7895 02 engineering and technology Machine learning computer.software_genre lcsh:QA75.5-76.95 Multiclass classification 020204 information systems 0202 electrical engineering electronic engineering information engineering Heartbeats classification lcsh:T58.5-58.64 business.industry lcsh:Information technology Machine-learning libraries (MLlib) Spark–Scala Random forest Statistical classification ComputingMethodologies_PATTERNRECOGNITION Binary classification Hardware and Architecture Scalability 020201 artificial intelligence & image processing Artificial intelligence Gradient boosting Electrocardiogram (ECG) lcsh:Electronic computers. Computer science business computer Algorithm Information Systems |
Zdroj: | Journal of Big Data, Vol 6, Iss 1, Pp 1-15 (2019) |
ISSN: | 2196-1115 |
DOI: | 10.1186/s40537-019-0244-x |
Popis: | This study proposed an ECG (Electrocardiogram) classification approach using machine learning based on several ECG features. An electrocardiogram (ECG) is a signal that measures the electric activity of the heart. The proposed approach is implemented using ML-libs and Scala language on Apache Spark framework; MLlib is Apache Spark’s scalable machine learning library. The key challenge in ECG classification is to handle the irregularities in the ECG signals which is very important to detect the patient status. Therefore, we have proposed an efficient approach to classify ECG signals with high accuracy Each heartbeat is a combination of action impulse waveforms produced by different specialized cardiac heart tissues. Heartbeats classification faces some difficulties because these waveforms differ from person to another, they are described by some features. These features are the inputs of machine learning algorithm. In general, using Spark–Scala tools simplifies the usage of many algorithms such as machine-learning (ML) algorithms. On other hand, Spark–Scala is preferred to be used more than other tools when size of processing data is too large. In our case, we have used a dataset with 205,146 records to evaluate the performance of our approach. Machine learning libraries in Spark–Scala provide easy ways to implement many classification algorithms (Decision Tree, Random Forests, Gradient-Boosted Trees (GDB), etc.). The proposed method is evaluated and validated on baseline MIT-BIH Arrhythmia and MIT-BIH Supraventricular Arrhythmia database. The results show that our approach achieved an overall accuracy of 96.75% using GDB Tree algorithm and 97.98% using random Forest for binary classification. For multi class classification, it achieved to 98.03% accuracy using Random Forest, Gradient Boosting tree supports only binary classification. |
Databáze: | OpenAIRE |
Externí odkaz: |