Analysis and classification of heart diseases using heartbeat features and machine learning algorithms

Autor: Hiam Alquran, Isam Abu-Qasmieh, Fajr Ibrahem
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Information Systems and Management
lcsh:Computer engineering. Computer hardware
Heartbeat
Computer Networks and Communications
Computer science
Decision tree
lcsh:TK7885-7895
02 engineering and technology
Machine learning
computer.software_genre
lcsh:QA75.5-76.95
Multiclass classification
020204 information systems
0202 electrical engineering
electronic engineering
information engineering

Heartbeats classification
lcsh:T58.5-58.64
business.industry
lcsh:Information technology
Machine-learning libraries (MLlib)
Spark–Scala
Random forest
Statistical classification
ComputingMethodologies_PATTERNRECOGNITION
Binary classification
Hardware and Architecture
Scalability
020201 artificial intelligence & image processing
Artificial intelligence
Gradient boosting
Electrocardiogram (ECG)
lcsh:Electronic computers. Computer science
business
computer
Algorithm
Information Systems
Zdroj: Journal of Big Data, Vol 6, Iss 1, Pp 1-15 (2019)
ISSN: 2196-1115
DOI: 10.1186/s40537-019-0244-x
Popis: This study proposed an ECG (Electrocardiogram) classification approach using machine learning based on several ECG features. An electrocardiogram (ECG) is a signal that measures the electric activity of the heart. The proposed approach is implemented using ML-libs and Scala language on Apache Spark framework; MLlib is Apache Spark’s scalable machine learning library. The key challenge in ECG classification is to handle the irregularities in the ECG signals which is very important to detect the patient status. Therefore, we have proposed an efficient approach to classify ECG signals with high accuracy Each heartbeat is a combination of action impulse waveforms produced by different specialized cardiac heart tissues. Heartbeats classification faces some difficulties because these waveforms differ from person to another, they are described by some features. These features are the inputs of machine learning algorithm. In general, using Spark–Scala tools simplifies the usage of many algorithms such as machine-learning (ML) algorithms. On other hand, Spark–Scala is preferred to be used more than other tools when size of processing data is too large. In our case, we have used a dataset with 205,146 records to evaluate the performance of our approach. Machine learning libraries in Spark–Scala provide easy ways to implement many classification algorithms (Decision Tree, Random Forests, Gradient-Boosted Trees (GDB), etc.). The proposed method is evaluated and validated on baseline MIT-BIH Arrhythmia and MIT-BIH Supraventricular Arrhythmia database. The results show that our approach achieved an overall accuracy of 96.75% using GDB Tree algorithm and 97.98% using random Forest for binary classification. For multi class classification, it achieved to 98.03% accuracy using Random Forest, Gradient Boosting tree supports only binary classification.
Databáze: OpenAIRE