Abstract
Heartbeats classification is a crucial tool for arrhythmia diagnosis. In this study, a multi-feature pseudo-color mapping (MfPc Mapping) was proposed, and a lightweight FlexShuffleNet was designed to classify heartbeats. MfPc Mapping converts one-dimensional (1-D) electrocardiogram (ECG) recordings into corresponding two-dimensional (2-D) multi-feature RGB graphs, and it offers good excellent interpretability and data visualization. FlexShuffleNet is a lightweight network that can be adapted to classification tasks of varying complexity by tuning hyperparameters. The method has three steps. The first step is data preprocessing, which includes de-noising the raw ECG recordings, removing baseline drift, extracting heartbeats, and performing data balancing, the second step is transforming the heartbeats using MfPc Mapping. Finally, the FlexShuffleNet is employed to classify heartbeats into 14 categories. This study was evaluated on the test set of the MIT-BIH arrhythmia database (MIT/BIH DB), and it yielded the results i.e., accuracy of 99.77%, sensitivity of 94.60%, precision of 89.83% and specificity of 99.85% and F1-score of 0.9125 in 14-category classification task. Additionally, validation on Shandong Province Hospital database (SPH DB) yielded the results i.e., accuracy of 92.08%, sensitivity of 93.63%, precision of 91.25% and specificity of 99.85% and F1-score of 0.9315. The results show the satisfied performance of the proposed method.
Keywords: Heartbeats classification, Feature fusion, Multi-feature pseudo-color mapping, Convolutional neural network
Introduction
For decades, cardiovascular diseases (CVDs) have been the leading cause of death globally. 17.9 million people died of CVD in 2019 accounting for 32% of global deaths [1]. ECG recordings capture the electrical activity of the heart, providing a comprehensive assessment of its electrical function. It is estimated that as many as 300 million ECGs are recorded annually in Europe alone [2]. This vast amount of ECG recording highlights the importance of computer-aided interpretation. Highly accurate computer-aided interpretation can save clinical specialists a great deal of time and effort and reduce the number of misdiagnoses. Therefore, the automatic analysis of heartbeat classification is a key research tool for cardiac arrhythmia analysis and one of the challenges in the analysis of ECG recordings [3, 4].
For arrhythmia analysis, classical methods combine machine learning with varying features empirically extracted from one-dimensional (1-D) ECG recordings. Martis et al. [5] automatically classified normal and abnormal beats using higher order spectra cumulants of wavelet packet decomposition and a support vector machine (SVM) with kernel function. Thilagavathy et al. [6] employed discrete wavelet transform to extract features from heartbeats, subsequently these features were input into SVM for the classification of heartbeats. Yang et al. [7] extracted features i.e., R–R interval, morphological features and wavelet transform coefficients, these features were then fed into an ensemble multiclass classifier, namely the mixed-kernel-based extreme learning machine-random forest-one-vs.-one to classify heartbeats. When extracting features, wavelet packet decomposition [5] and wavelet transform [6, 7] generate a large number of features resulting in a high dimensional feature space. The high dimensional feature space increases the difficulty of model training and may have feature redundancy, some redundant features are not only invalid but also reduce classification accuracy. Collectively, in the classic machine learning method, the quality of features can significantly impact classification performance. Therefore, deep learning algorithms have been extensively applied in ECG analysis, as they can successfully learn complex representative features of ECG recordings with less or without excessive dependencies on manual feature extraction.
Currently, some studies have employed deep learning to extract abstract features from 1-D ECG recordings. Jin et al. [8] proposed an atrial fibrillation (AF) detector based on a twin-attentional convolutional long short-term memory neural network for AF classification, which extracted features from ECG recordings and provided interpretability. Hua et al. [9] proposed an R-R-R strategy which retained the ECG data between the R peaks just before and after the current R peak; this strategy was employed to segment the original ECG recordings into the segments for training and testing a 1-D CNN. Hasan et al. [10] employed empirical mode decomposition and the higher order intrinsic mode functions to form a modified ECG recording, subsequently a 1-D CNN using the modified ECG recordings as input was employed for heartbeats classification. The aforementioned methods avoid feature extraction through artificial experience to achieve satisfied generalization ability. However, 1-D data contains very limited information from which deep learning methods can hardly learn effectively. In contrast, 2-D data can better capture spatial relationships in the data to represent more information and complex patterns and structures. Therefore, researchers have tried to convert 1-D ECG recordings into corresponding 2-D data through various methods to obtain better performance. Li et al. [11] proposed an S-shaped reconstruction method to convert the 1-D ECG recordings into their corresponding 2-D diagram which were then fed into a 2-D 19-layer deep squeezed excitation residual network (SE-ResNet) for recognizing cardiac arrhythmias. Li et al. [12] proposed Z-shaped signal reconstruction to convert a 1-D ECG recording into a 2-D ECG matrix, and the ECG matrix was then as input for a self-complementary attentional convolutional neural network. Mathunjwa et al. [13] converted the original ECG recordings using a recurrence plot, employing 2-D graphs as inputs for a CNN classifier. Zhang et al. [14] used hybrid time-frequency analysis to generate 2-D time-frequency graphs, then transfer learning and ResNet-101 was employed for arrhythmia identification. The above method obtains better performance using 2-D CNNs by transforming original ECG recordings into a 2-D data form. However, the reconstruction methods [11, 12] simply lay out the ECG recordings in 2-D matrixes, which still contains limited information and does not allow for visualization of the data. While the recurrence plot [13] and the hybrid time-frequency analysis [14] even though visualization of the data is achieved, we are still not able to understand the meaning visually. In addition, the above methods only rely on abstract features extracted by deep learning for arrhythmia analysis and do not achieve multi-feature fusion, limiting the potential for further performance improvement.
Based on the above analysis, a MfPc Mapping was introduced in this study to convert 1-D original ECG recordings into corresponding 2-D multi-feature RGB graphs, which contain statistical features, dynamical features, primitive features, and morphological features of heartbeats. These features are consistent with human visual perception habits (e.g., various features can be easily distinguished by color as well as morphology), realizing excellent data visualization, and on this basis, when combined with class activate mapping (CAM), the basis of model classification can be pinpointed and interpretability can be achieved, which is of practical significance for clinical practice. With this feature fusion approach, the complementary nature of different features can be utilized to enhance the representation of data and provide more comprehensive information, thus improving the predictive performance of the model. Rapid processing and analysis of ECG recordings can help doctors make a diagnosis in the shortest possible time and take the necessary therapeutic measures, thus improving the survival rate of patients. Therefore, it is crucial to reduce computational resource consumption and enhance the real-time processing capability of the model in resource-constrained environments. ShuffleNet v2 [15] sees the impact of memory access cost (MAC) on model inference time, not just on model complexity, and in doing so proposes 4 principles for lightweight network design. The results show that the ShuffleNet v2 model largely outperforms other networks, especially with small computational budgets. This shows that ShuffleNet v2 outperforms traditional CNNs in terms of lightweight design, enabling faster inference and higher accuracy. The human brain is able to understand everything around it quickly because it can mine the most important information from a large amount of information [16]. Some features play a decisive role in the classification of model, while others have almost no effect. This difference can be attributed to the attention mechanism, as deep learning models focus on more effective features rather than ineffective ones in an effort to improve classification performance. SimAM [17] is a lightweight, parameter-free attention mechanism that generates the attention weights by calculating the local self-similarity of the feature maps without introducing any additional parameters, which can effectively improve the performance of the CNN while maintaining a lightweight design. Therefore, in this study, ShuffleNet v2 is chosen as the backbone, and based on it, FlexShuffleNet is proposed, which contains the SimAM attention mechanism and can flexibly control the structure of the network. By adjusting the hyper-parameter controlling the structure of the network, FlexShuffleNet can achieve a better trade-off between speed and accuracy, and adapt to different levels of classification complexity. levels of classification complexity. For example, on MIT/BIH DB and SPH DB, respectively, different hyperparameters were chosen to be used to make the FlexShuffleNet match classification tasks of different complexity and yield a satisfied performance. The contribution of this study is listed below:
The MfPc Mapping converts 1-D data into 2-D multi-feature RGB graphs, which achieves multi-feature fusion, data visualization, and interpretability.
FlexShuffleNet adapts to different levels of classification complexity by tuning the hyperparameters that control its structure, with higher parameter efficiency.
The proposed method applied to arrhythmia classification and obtained satisfactory results.
This paper is organized as follows: Materials and methods Section introduces the data and method, covering data preprocessing, MfPc Mapping and FlexShuffleNet. Results Section provides the classification results. Discussion Section analyzes the proposed methods, including comparison of the classification results with previous works, the ablation experiments and description of the interpretability. Conclusion Section summarizes the work.
Materials and Methods
Database
The MIT-BIH arrhythmia database (MIT/BIH DB) [17] on the PhysioNet contains 48 ECG records with a duration of 30 min from 47 individuals. Each recording was sampled at 360 Hz. In the ECG recordings, each heartbeat is identified by the position of the R wave, all heartbeats are classified into one of 15 categories by at least two cardiologists [18]. The categories include normal beat (N), left bundle branch block beat (L), right bundle branch block beat (R), atrial escape beat, (e) nodal (junctional) escape beat (j), atrial premature beat (A), aberrant atrial premature beat (a), nodal (junctional) premature beat (J), premature ventricular contraction (V), ventricular escape beat (E), fusion of ventricular and normal beat (F), paced beat (/), fusion of paced and normal beat(f), unclassifiable beat (Q) and Premature or ectopic supraventricular beat (S).
The Shandong Province Hospital Database (SPH DB) contains 24-hour ambulatory 12-lead ECG recordings with sampling rate 200 Hz from five individuals captured by a Holter wearable ECG device, and the database was labeled as AF and non-atrial fibrillation (NAF) categories by clinical experts. This study used the II lead to validate the proposed method, and it was approved by the Ethics Committee of Shandong Provincial Hospital.
Method overview
Figure 1 shows the flowchart of the proposed method. The proposed method consists of three modules: data preprocessing, the MfPc Mapping and the FlexShuffleNet. The original ECG recordings were preprocessed by removing noise, heartbeats segmentation and data balancing. During the MfPc Mapping, three artificial features i.e., amplitudes, the peak and difference between neighboring point were extracted from each heartbeat. Based on these features, the corresponding RGB values were calculated to generate 2-D multi-feature RGB graphs. Finally, the graphs were fed into the FlexShuffNet for 14-category classification.
Fig. 1.
The flowchart of the proposed method
Data preprocessing
Removing noise and Heartbeat segmentation
The ECG recordings from the MIT/BIH DB contains 50/60 Hz power-line noise, muscle artifacts and baseline wander (BW). Therefore, Daubechies wavelet 6 filters [19] are employed to remove the noise, muscle artifacts and BW for a satisfied accuracy.
Heartbeats were segmented according to the annotated files contained in the MIT/BIH DB which provides the categories of heartbeat and the locations of the R waves. Each heartbeat was segmented 236 points contained 108 points before the R-peaks and 128 points behind it. Subsequently, all segment of heartbeats were normalized using Z-score normalization to address amplitude scaling, and the segments were removed offset effects [20]. Finally, a total of 105037 ECG heartbeats were extracted.
For SPH DB, a nine-level wavelet decomposition of the original ECG recordings was performed using Daubechies wavelet 6 filters [19]. After eliminating the D1, D2 and A9 components, the remaining components were reconstructed to obtain the filtered signal. Then, based on the clinician recommendations, 1.18 s ECG heartbeats were obtained from the denoised signal based on the labeling and the segmentation method in the MIT/BIH DB described above. Finally, a total of 7881 ECG heartbeats were extracted.
Data balancing
In this study, the heartbeats of all categories were randomly divided into training set, validation set and test set. For each category, 70% of the heartbeats were allocated to the training set, 20% to the validation set, and 10% to the test set.
To address the severe imbalance problem in the MIT/BIH DB, where 74759 heartbeats are labeled as N category and only 15 heartbeats are labeled as Q category, data balancing was applied in this study. A combination of oversampling and under sampling was employed. The sample size of the N category with a larger number of heartbeats was reduced, while the categories with a smaller size were duplicated for balancing data. Table 1 shows the 14 original categories heartbeats of the MIT/BIH DB before and after using the combination of oversampling and under sampling. It is worth noting that only 14-category heartbeats were balanced, because the sample size of the S category was too small for the classification in this study. Such a combination of oversampling and under sampling enables a better evaluation of the performance of the proposed method.
Table 1.
Number of samples in each category before and after data balancing on the MIT/BIH DB
| Categories | Total number | Training set | Validation set | Test set | |
|---|---|---|---|---|---|
| Before | After | ||||
| N | 74759 | 52331 | 12500 | 2000 | 1200 |
| L | 8071 | 5650 | 5650 | 1621 | 800 |
| R | 7236 | 5079 | 5079 | 1455 | 722 |
| e | 16 | 10 | 140 | 4 | 2 |
| j | 229 | 160 | 480 | 46 | 23 |
| A | 2546 | 1782 | 1782 | 509 | 255 |
| a | 150 | 105 | 420 | 30 | 15 |
| J | 83 | 58 | 348 | 17 | 8 |
| V | 7123 | 4986 | 4986 | 1426 | 711 |
| E | 108 | 76 | 152 | 20 | 10 |
| F | 802 | 561 | 1683 | 159 | 82 |
| / | 3619 | 2533 | 2533 | 724 | 362 |
| f | 260 | 182 | 546 | 52 | 26 |
| Q | 15 | 9 | 90 | 4 | 2 |
The MfPc Mapping
In this study, a MfPc Mapping method was proposed. This method can transform a heartbeat with a length of 236 into a 2-D multi-feature RGB graph with a size of . The steps of MfPc Mapping include Calculating RGB values using artificial features and Generating graph. Figure 2 shows the Flowchart of the MfPc Mapping.
Fig. 2.
The flowchart of MfPc Mapping
Calculating RGB values using artificial features
For each heartbeat, statistical, dynamic and primitive features are first extracted, i.e., amplitudes, peaks and differences between neighboring points for all points. Regarding the reasons for selecting these features, firstly, these selected features can be intuitively felt and understood in the generated graph, thus enabling data visualization. Second, experiments have shown that 2-D multi-feature RGB graphs generated from these features demonstrate excellent performance in classification experiments. Subsequently, the values of these features are mapped to the RGB components of the color channel. The method of calculating RGB values using artificial features is as follows:
First, he max-min normalization mapped the data to the range of 0 and 1:
| 1 |
where is the minimum value in the all data, is the maximum value in the all data, (i = 1, 2, ..., 236) is the original data point.
Then, for the R component , is the value of at each point, so it can be calculated as:
| 2 |
Then, for the G component of each point in the heartbeat can be calculated as:
| 3 |
where li (i = 1, 2, ..., 236) is the index of each point, lmax is the index of the maximum value point of a heartbeat.
Finally, for the B component xBi of each point in the heartbeat can be calculated as:
| 4 |
After the calculations, R, G, and B components of each point in the heartbeat were obtained. contains information about amplitudes, contains information about the peak, and contains information about the difference between neighboring points.
Generating graph
This step combines the computed R, G, and B components into a final 2-D multi-feature RGB graph. First, based on the waveforms of the corresponding heartbeats, the R, G, and B components are plotted into corresponding subgraphs. Finally, the resulting three subgraphs are superimposed to form a 2-D multi-feature RGB graph, where the three channels of RGB correspond to the three extracted features and the morphological features is also characterized in the final graph. As can be seen from Fig. 2, the final image formed clearly shows the statistical features (peaks), dynamical features (differences of adjacent points), primitive features (amplitudes) and morphological features of heartbeats, achieving multi-feature fusion and data visualization. Applying CAM to a 2-D multi-feature RGB graph can pinpoint the classification basis of the model, provide interpretability, and assist in clinical diagnosis.
FlexShuffleNet
FlexShuffleNet, designed with a flex channel split ratio and SimAM, is introduced in this study. The flex channel split ratio is designed to replace the fixed split value of 0.5 in ShuffleNet V2, which is controlled by the hyperparameter r. For the hyperparameter r , r/1 channels in the feature map will be directed to the right branch of the Flex unit for the convolution operation, and (1 − r)/1 channels will be directed to the left branch for identity connection. When r is adjusted to a larger value, more channels are directed to the right branch for convolutional operations. This enables the model to process the input features more thoroughly, enhancing its ability to fit the data and improving its capacity to learn and generalize, thus performing complex tasks more effectively. Conversely, when performing simple classification tasks, r can be adjusted to a smaller value. This reduces the number of three-layer convolutional operations in the right branch, preventing the model from becoming overly complex, capturing irrelevant features or noise, and leading to overfitting. Consequently, this adjustment improves the model’s generalization ability. Figure 3a and b exhibit the structure of the down sampling Flex unit and the basic Flex unit, respectively. With the hyperparameter r, the ratio of convolution operations to identity connections in the cell can be controlled, allowing the model to better adapt to classification tasks of varying complexity. Fewer parameters usually indicate a less complex model, which might lead to poorer performance. Therefore, the parameter-free lightweight attention mechanism SimAM was introduced into the model to maintain high performance while ensuring a lightweight design.
Fig. 3.
the structure of the down sampling flex unit and the basic flex unit
Table 2 describes the structural details of FlexShuffleNet. The designed FlexShuffleNet first captures low-level local patterns and structures through convolutional layer Conv1 and maximum pooling layer MaxPool, followed by three Stages (Stage2, Stage3 and Stage4) to capture more complex and abstract features. FlexShuffleNet have the same r = 0.5 in Stage2, this is because the shallow network contains more fine-grained information and has fewer channels, the use of smaller channel split radio will not significantly improve the efficiency of the model but may result in the loss of important information. in Stage3 and in Stage4 are hyperparameters that can be adjusted. Each Stage consists of one down sampling Flex unit and multiple basic Flex units, where Stage2 contains three 0.5 basic Flex units, Stage3 contains seven basic Flex units and Stage4 contains seven basic Flex units. Conv5 is the last convolutional layer in the network, which helps the network to form a global understanding of the whole image and obtain high-level semantic information. At the end of the network is a global average pooling layer and a fully connected layer for classification.
Table 2.
The architecture of FlexShuffleNet
| Layer | Output size | Kernel size | Stride | FlexShuffleNet | Output channels |
|---|---|---|---|---|---|
| Conv1 | 112 112 | 3 3 | 2 | Conv1 & MaxPool 1 | 24 |
| MaxPool | 56 56 | 3 3 | 2 | ||
| Stage2 | 28 28 | – | 2 | down sampling Flex Unit 1 | 120 |
| 28 28 | 1 | 0.5 basic Flex Unit 3 | |||
| Stage3 | 14 14 | – | 2 | down sampling Flex Unit 1 | 240 |
| 14 14 | 1 | basic Flex Unit 7 | |||
| Stage4 | 7 7 | – | 2 | down sampling Flex Unit 1 | 480 |
| 7 7 | 1 | basic Flex unit 7 | |||
| Conv5 | 7 7 | 1 1 | 1 | Conv5 1 | 1024 |
| GlobalPool | 1 1 | 7 7 | – | GlobalPool 1 | – |
| FC | – | – | – | FC 1 | 14 |
Based on the different hyperparameters in Stage3 and in Stage4, three types of FlexShuffleNet were designed in this study: FlexShuffleNet (0.25), FlexShuffleNet (Mixed), and FlexShuffleNet (0.375). Table 3 illustrates the hyperparameters for different types of FlexShuffleNet.
Table 3.
The channel split radio in three types of FlexShuffleNet
| Types of FlexShuffleNet | of Stage3 | of Stage4 |
|---|---|---|
| FlexShuffleNet (0.25) | 0.25 | 0.25 |
| FlexShuffleNet (Mixed) | 0.25 | 0.375 |
| FlexShuffleNet (0.375) | 0.375 | 0.375 |
Results
Evaluation metrics
The following metrics were employed to evaluate the performance of the proposed method: True positive (Tp) was defined as the number of heartbeats in a given category correctly classified as that category. False negative (Fn) was defined as the number of heartbeats in a given category incorrectly classified as other categories. True negative (Tn) was defined as the number of heartbeats of other categories not classified as the given category. False positive (Fp) was defined as the number of heartbeats in other categories incorrectly classified as the given category. Four statistics indices i.e. accuracy (acc), positive predictive value (ppv), sensitivity (sen) and specificity (spec) were calculated to evaluate the performance of the proposed FlexShuffleNet in this study. acc is the number of samples correctly predicted by the model as a proportion of the total number of samples, ppv denotes the proportion of actual positive samples out of all samples predicted by the model as positive samples, sen denotes the proportion of positive samples correctly identified by the model out of all actual positive samples, and spec denotes the proportion of negative samples correctly identified by the model out of all actual negative samples. In addition, the acc, ppv, sen, and spec can be calculated as follows [21]:
| 5 |
| 6 |
| 7 |
| 8 |
F1-score was also employed for evaluating the performance of the proposed method because classification of the arrhythmia heartbeats is a multi-classification problem. The F1-score can be interpreted as a micro average of the precision and recall, where an F1-score reach its best value at 1 and the worst score at 0.
| 9 |
| 10 |
where , , and represent the F1-score, ppv, and ppv for a particular category, respectively. The F1-score for the classification tasks in 14 categories is arrived at by taking the arithmetic mean of the F1-score for each category.
Classification results
Results for the MIT/BIH DB
Table 4 presents the confusion matrix and classification results for the 14-category heartbeats in MIT/BIH DB after data balancing obtained by the proposed method trained at a learning rate of 0.001. FlexShuffleNet (Mixed) yielded the averages of the five metrics i.e., acc 99.77%, ppv 94.60%, sen 89.83%, spec 99.85% and F1-Score 0.9125, less than 1% of the heartbeats was wrongly classified by the FlexShuffleNet (Mixed). For categories with large sample sizes, 99.91% of heartbeats were correctly classified as the L category, 99.05% of heartbeats were correctly classified as the N category, 99.98% of heartbeats were correctly classified as the/category, 99.86% of heartbeats were correctly classified as the R category. The Q category and e category yielded the lowest ppv 50.00% and sen 66.67% due to their small sample sizes, making it challenging for the model to effectively learn features for these categories. Additionally, these two categories had a notable impact on the average accuracy and sensitivity.
Table 4.
Confusion matrix and classification results of 14-category heartbeats in MIT/BIH DB using FlexShuffleNet (Mixed)
| Predicted | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | E | F | J | L | N | P | Q | V | R | a | e | f | j | acc (%) | ppv (%) | sen (%) | spec (%) | F1 | ||
| A | 237 | 0 | 0 | 1 | 0 | 16 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 99.34 | 92.94 | 95.95 | 99.75 | 0.9442 | |
| E | 0 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 99.98 | 90.00 | 100.00 | 100.00 | 0.9474 | |
| F | 2 | 0 | 68 | 0 | 0 | 4 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 99.62 | 85.00 | 94.44 | 99.90 | 0.8947 | |
| J | 1 | 0 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99.95 | 87.50 | 87.50 | 99.98 | 0.8750 | |
| L | 0 | 0 | 0 | 0 | 797 | 1 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 99.91 | 99.62 | 99.87 | 99.97 | 0.9975 | |
| N | 2 | 0 | 1 | 0 | 0 | 1191 | 0 | 0 | 1 | 4 | 0 | 0 | 0 | 1 | 99.05 | 99.25 | 97.46 | 98.97 | 0.9835 | |
| P | 1 | 0 | 0 | 0 | 0 | 0 | 361 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 99.98 | 99.72 | 100.00 | 100.00 | 0.9986 | |
| Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 99.98 | 50.00 | 100.00 | 100.00 | 0.6667 | |
| R | 3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 717 | 1 | 0 | 0 | 0 | 0 | 99.86 | 99.31 | 99.86 | 99.97 | 0.9958 | |
| V | 2 | 0 | 3 | 0 | 1 | 2 | 0 | 0 | 0 | 703 | 0 | 0 | 1 | 0 | 99.38 | 98.74 | 97.64 | 99.51 | 0.9818 | |
| a | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 13 | 0 | 0 | 0 | 99.95 | 86.67 | 100.00 | 100.00 | 0.9286 | |
| e | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 99.98 | 100.00 | 66.67 | 99.98 | 0.8000 | |
| f | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 25 | 1 | 99.95 | 96.15 | 96.15 | 99.98 | 0.9615 | |
| True | j | 0 | 0 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 16 | 99.81 | 72.73 | 88.89 | 99.95 | 0.8000 |
| The average | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – | 99.77 | 94.60 | 89.83 | 99.85 | 0.9125 |
Results on the SPH DB
Table 5 presents the classification results obtained by the proposed method on the SPH DB, trained at a learning rate of 0.001. FlexShuffleNet (Mixed) yielded the averages of the five metrics i.e., acc 92.08%, ppv 93.63%, sen 92.69%, spec 91.25% and F1 0.9315.
Table 5.
Confusion matrix and classification results of 2-kind heartbeats in SPH DB using FlexShuffleNet (Mixed)
| Predicted | ||||||||
|---|---|---|---|---|---|---|---|---|
| AF | NAF | acc (%) | ppv (%) | sen (%) | spec (%) | F1 | ||
| AF | 4245 | 289 | ||||||
| True | NAF | 335 | 3012 | 92.08 | 93.63 | 92.69 | 91.25 | 0.9315 |
Adaptation of FlexShuffleNet to classification tasks of different complexity
For the 14-category classification task on MIT/BIH DB, it can be considered to have higher classification complexity due to more categories and larger amount of data. In contrast, the complexity of the binary categorization task on SPH DB is lower. Table 6 shows the performance of different types of FlexShuffleNet on the MIT/BIH DB and SPH DB classification tasks. From the table, it can be seen that FlexShuffleNet (Mixed) achieves the best results on MIT/BIH DB, FlexShuffleNet (0.375) is the second best and FlexShuffleNet (0.25) is the worst. While on SPH DB, FlexShuffleNet (0.25) achieved the best results, with FlexShuffleNet (Mixed) and FlexShuffleNet (0.375) in the second and third places, respectively. This indicates that for classification tasks of different complexity, choosing the appropriate FlexShuffleNet can effectively improve the classification effect, prevent overfitting and improve the generalization ability.
Table 6.
Performance of different types of FlexShuffleNet under different complexity classification tasks
| Task | Database | FlexShuffleNet | acc (%) | ppv (%) | sen (%) | spec (%) | F1 |
|---|---|---|---|---|---|---|---|
| 14 categories | MIT/BIH DB | FlexShuffleNet (0.25) | 99.71 | 93.55 | 88.76 | 99.83 | 0.9012 |
| FlexShuffleNet (Mixed) | 99.77 | 94.60 | 89.83 | 99.85 | 0.9125 | ||
| FlexShuffleNet (0.375) | 99.73 | 94.21 | 89.15 | 99.83 | 0.9055 | ||
| 2 categories | SPH DB | FlexShuffleNet (0.25) | 92.33 | 94.15 | 93.05 | 92.14 | 0.9360 |
| FlexShuffleNet (Mixed) | 92.08 | 93.63 | 92.69 | 91.25 | 0.9315 | ||
| FlexShuffleNet (0.375) | 91.69 | 93.05 | 92.14 | 90.89 | 0.9259 |
Comparison with the previous work
Table 7 illustrates a comparison between the proposed method in this study and previous studies, focusing on heartbeats classification using various machine learning and deep learning methods after 2018, it indicates the proposed method yielded the highest acc 99.77%. Zhang et al. [14] use hybrid time-frequency analysis and transfer learning based on ResNet to classify 14-category arrhythmias. Similar to this study, Zhang et al. [14] excluded the S category to obtain satisfactory results, and they obtained lower acc 99.75%, higher sen 91.36%, the same spec 99.85% and lower F1 0.90000 than acc 99.77%, sen 89.83%,specc 99.85% and F1 0.9125. However, Zhang et al. [14] used the Hilbert transform with O(NlogN) computational complexity and the Wigner–Ville distribution with computational complexity. In contrast, the computational complexity of MfPc Mapping proposed in this paper is O(N), which is a more efficient method to transform 1-D data into 2-D data. In addition, FlexShuffleNet is far superior to ResNet-101 in lightweight design, and has advantages in the number of parameters, model size, and inference speed, which can better meet the real-time demand. Kuila et al. [25] proposed a combination of extreme learning machine and RNN to classify 14-category of heartbeats. In comparison, the proposed method has 3.46% higher acc and 7.19% higher spec, but 3.79% lower sen. Kuila et al. [25] used 1-D data for classification, which contains very limited information and does not show good generalization ability. For the lower sensitivity of the proposed method, it can be attributed to the very small number of samples in J category, e category and j category. As can be seen from Table IV, for class e, even though Fp is 1, this still leads to a sen of only 66.7% for e category since the test set for e category has only two samples. Similarly, the sen for J and j category are only 87.50% and 88.89%, respectively. In contrast, Shaker et al. [23] proposed the use of GAN for data balancing, which resulted in an excellent sen 99.77%. However, in comparison, the proposed method outperforms it by 1.47%, 1.41% and 0.62% in terms of acc, ppv and spec, respectively.
Table 7.
Comparison of this study with previous works
| Author | Year | Task | Database | Methods | acc (%) | ppv (%) | sen (%) | spec (%) | F1 |
|---|---|---|---|---|---|---|---|---|---|
| El-saadawy et al. [22] | 2018 | 15 categories | MIT/BIH DB | SVM | 94.94 | 93.19 | – | – | – |
| Shaker et al. [23] | 2020 | 15 categories | MIT/BIH DB | GAN + CNN | 98.30 | 90.00 | 99.77 | 99.23 | – |
| Tao et al. [24] | 2020 | 15 categories | MIT/BIH DB | NN | 93.90 | 100.00 | – | – | – |
| Zhang et al. [14] | 2021 | 14 categories | MIT/BIH DB | 2-D CNN | 99.75 | – | 91.36 | 99.85 | 0.9000 |
| Kuila et al. [25] | 2022 | 14 categories | MIT/BIH DB | ELM-RNN | 96.41 | – | 93.62 | 92.66 | – |
| This work | 2023 | 14 categories | MIT/BIH DB | 2-D CNN | 99.77 | 94.60 | 89.83 | 99.85 | 0.9125 |
Discussion
Comparison with the classical models
Table 8 shows the comparison of the performance and efficiency of the FlexShuffleNet proposed in this paper, with ShuffleNet V2, ResNet18 [26] and GoogLeNet [27]. Params represents the total number of trainable parameters in the model, serving as a metric for computational space complexity during model training. Multiply Accumulate operations (MACs) are used to measure the computational load of the models. MACs quantify the total number of multiply-add operations within the model. Neural networks involve numerous Multiply-Accumulate operations, Matrix-Vector Multiplication, and General Matrix-Matrix Multiplication, each comprising many individuals multiply-add operations. Frame Per Second (FPS) indicates the number of frames (images) the model can process per second, offering an assessment of the inference speed of the model. Model size (Size) denotes the storage space occupied by the trained model, with data stored in float32 format.
Table 8.
Comparison of the performance and efficiency of the proposed model with previous models
| Model | Params (M) | MACs (M) | FPS | Size (MB) | acc (%) | |
|---|---|---|---|---|---|---|
| ShuffleNet V2 | 2.27 | 13.41 | 53.55 | 5.00 | 99.74 | |
| ResNet18 | 11.68 | 149.37 | 13.64 | 42.74 | 99.78 | |
| Previous | GoogLeNet | 5.61 | 123.31 | 13.28 | 38.22 | 99.69 |
| FlexShuffleNet (0.25) | 0.92 | 9.47 | 54.76 | 3.64 | 99.71 | |
| FlexShuffleNet (Mixed) | 1.02 | 9.92 | 54.01 | 4.07 | 99.77 | |
| This work | FlexShuffleNet (0.375) | 1.09 | 10.99 | 49.15 | 4.32 | 99.73 |
The three types of FlexShuffleNet demonstrated a substantial reduction in Params compared to ShuffleNet V2, resulting in improvements in MACs, FPS, and Size. Particularly, FlexShuffleNet (Mixed) achieved a superior accuracy of 99.77%. In comparison to GoogLeNet, all FlexShuffleNet types outperformed in the five metrics. Against ResNet18, FlexShuffleNet exhibited a significant advantage in Params, MACs, FPS, and Size. Notably, FlexShuffleNet (Mixed) had only 8.73% of the Params and 6.64% of the MACs of ResNet18, while maintaining a nearly identical accuracy with a marginal decrease of 0.01%. Regarding inference speed, FlexShuffleNet (Mixed) was approximately 3.96 times faster than ResNet18, and its Size was less than 1/10th of ResNet18. Collectively, FlexShuffleNet achieves better performance than ResNet18 in the classification task of this study.
FlexShuffleNet was designed to be more lightweight, substantially reducing the Params and the MACs, while maintaining a high level of accuracy and enabling improved inference speed, striking a balance between inference speed and performance.
Ablation experiments
The ablation experiment was designed to assess the efficacy of the three selected features in heartbeat classification, comprising three distinct experiments: experiment , experiment , and experiment . Each experiment involved the removal of one channel to evaluate its impact on classification performance. Specifically, experiment , experiment , and experiment entailed the removal of the red channel, green channel, and blue channel, respectively, allowing for the assessment of the influence of amplitudes, the peak, and difference between neighboring points on classification effectiveness. All ablation experiments were meticulously conducted following the procedure outlined in Fig. 1. The comparative classification results of the three ablation experiments and the proposed method are illustrated in Fig. 4.
Fig. 4.
The results of ablation experiments
Collectively, the experiment yielded the worst classification result. Compared with the proposed method, acc, ppv, sen and spec were reduced by 0.11%, 6.8%, 9.13%, and 0.05%, respectively. This indicates that the blue channel which contains difference between neighbor data points plays a crucial role in heartbeats classification. For the experiment , Although the green channel which contains peak value and its location represents less information, it still significantly impacts in the performance of model. In comparison to the proposed method, acc, ppv, sen, and spec decreased by 0.07%, 1.71%, 4.31%, and 0.03%, respectively. As for experiment , where the red channel containing the amplitude was removed, yielded a better performance than experiment and experiment .
Interpretability of model decisions
In order to interpret the decisions of the FlexShuffleNet, global average pooling (GAP) was employed to generate class activation maps (CAM) [28] from the last feature layers of the trained Mixed FlexShuffleNet. These heatmaps are defined as an attention-based mechanism that mimics human perception, and are considered as selective attention for identifying regions with the highest impacts on the decisions [29]. For each category, we extract the heatmap of each test sample in the test set using the GAP. The obtained heatmaps were then summed and averaged to create the unique CAM for each category. An example of obtaining the unique CAM for the N category is illustrated in Fig. 5, while the unique CAMs for all 14-category heartbeats are presented in Fig. 6. For several categories i.e., A, E, f, L, P, R, and V, the model concentrates more on information related to Q-R-S intervals when making decisions. In contrast, for a, e, and F category, the model pays more attention to information about P-R intervals and S-T segments. For J and j Category, they receive more focus on information about S-T segments. For N category, the model emphasizes information about P-R intervals. However, for Q category, which is annotated as the unclassifiable beat, the smaller number of samples makes it difficult for the model to learn and classify this category effectively, leading to less obvious focus areas.
Fig. 5.
The method of getting the unique CAM
Fig. 6.
Focus of model on 14-category heartbeats, obtained by overlapping randomly selected samples of correctly predicted and unique CAM for this category
Comparison of MfPc Mapping with 1-D methods
Table 9 compares the classification results on MIT/BIH DB of the MfPc Mapping and FlexShuffleNet (Mixed) proposed in this paper with the 1-D model. In the 1-D experiments, the ECG recordings, the difference of neighboring points, and the peaks are combined into a 1-D input using the 1-D model for training and classification. It can be concluded that the 2-D FlexShuffleNet (Mixed) outperforms all the 1-D models, which indicates that MfPc mapping does help the model to learn more information from the heartbeat data and thus improve the classification accuracy. Among the five 1-D models, the 1-D FlexShuffleNet (Mixed) achieves the best results, which suggests that FlexShuffleNet (Mixed) outperforms the other classical 1-D models in heartbeats classification. Therefore, using FlexShuffleNet (Mixed) for heartbeat classification on 2-D multi-feature RGB graphs obtained through MfPc mapping is an effective method.
Table 9.
Comparison of MfPc mapping and 2-D FlexShuffleNet (mixed) with 1-D methods
| Algorithm | acc (%) | ppv (%) | sen (%) | spec (%) | F1 |
|---|---|---|---|---|---|
| MfPc Mapping + FlexShuffleNet (Mixed) | 99.77 | 94.60 | 89.83 | 99.85 | 0.9125 |
| 1-D FlexShuffleNet (Mixed) | 99.65 | 93.85 | 86.22 | 99.70 | 0.8987 |
| 1-D ShuffleNet V2 | 99.34 | 90.25 | 75.19 | 99.84 | 0.8203 |
| 1-D ResNet18 | 99.56 | 93.76 | 82.45 | 99.83 | 0.8774 |
| 1-D GoogLeNet | 98.85 | 87.85 | 80.65 | 98.54 | 0.8410 |
| 1-D AlexNet | 97.35 | 87.60 | 81.37 | 98.07 | 0.8437 |
Conclusion
MfPc Mapping and FlexShuffleNet are effective methods for classifying 14-categories of heartbeats. MfPc Mapping is a 1-D to 2-D data transformation method that achieves multi-feature fusion, data visualization, and interpretability. FlexShuffleNet is characterized by a lightweight design, low parameters, low computation, and low inference latency, and can be adapted to different complexity of classification tasks. Integrating MfPc Mapping and FlexShuffleNet into an existing healthcare information system or electronic medical record system enables fast and reliable diagnosis. Real-time feedback of the analyzed results to the doctors can quickly diagnose the patients and give treatments, thus reducing the probability of misdiagnosis and healthcare costs. However, compared with 1-D CNNs or RNNs, the proposed method can show satisfactory results on datasets with shorter signal segments (e.g., MIT/BIH DB and SPH DB), and are not applicable to relatively long ECG recordings.
The FlexShuffleNet (Mixed) achieved acc 99.77%, ppv 94.60%, sen 89.83%, spec 99.85%, and F1 0.9125 in the 14-category classification tasks on the MIT/BIH DB. This demonstrates the potential of the proposed method as a powerful tool in clinic diagnostics.
In future research, we hope to test the proposed method on more types of physiological signals, explore the possibility of more feature fusion, improve the model architecture, and realize efficient real-time monitoring and diagnosis, so as to provide more effective help to doctors and more reliable diagnosis to patients.
Author contributions
Yijun Ma: Writing-original draft, Conceptualization, Software, Data Curation, Validation. Junyan Li: Validation, Formal analysis, Data Curation. Jinbiao Zhang: Formal analysis, Validation. Jilin Wang: Software, Validation. Guozhen Sun: Software, Validation. Yatao Zhang: Conceptualization, Writing-Reviewing and Editing, Supervision, Methodology
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 82072014, 62076149, and 62376136.
Data availability
Not applicable.
Material availability
Not applicable.
Code availability
Not applicable.
Declaration
Conflict of interest
The authors declare that there are no conflicts of interest to this work.
Ethical approval
This study was approved by the Ethics Committee of Shandong Provincial Hospital.
Consent for publication
Not applicable.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Organization WH. Cardiovascular diseases (CVDs). https://www.who.int/en/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). Accessed 15 April 2024.
- 2.Organisation WH. Uses of the electrocardiogram. EURO reports and studies, vol. 37. Regional Office for Europe, Copenhagen (1981). Report on a WHO study; project ICP/ATH 003
- 3.He J, Sun L, Rong J, Wang H, Zhang Y. A pyramid-like model for heartbeat classification from ECG recordings. PLoS One. 2018;13(11):0206593. 10.1371/journal.pone.0206593. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Luz EJ, Schwartz WR, vez G, Menotti D. ECG-based heartbeat classification for arrhythmia detection: a survey. Comput Methods Programs Biomed. 2016;127:144–64. 10.1016/j.cmpb.2015.12.008. [DOI] [PubMed] [Google Scholar]
- 5.Martis RJ, Acharya UR, Ray AK, Chakraborty C. Application of higher order cumulants to ECG signals for the cardiac health diagnosis. Annu Int Conf IEEE Eng Med Biol Soc. 2011;2011:1697–700. 10.1109/IEMBS.2011.6090487. [DOI] [PubMed] [Google Scholar]
- 6.Thilagavathy R, Srivatsan R, Sreekarun S, Sudeshna D, Priya PL, Venkataramani B. Real-time ecg signal feature extraction and classification using support vector machine. 2020 Int Conf Contemp Comput Appl (IC3A). 2020. 10.1109/IC3A48958.2020.233266. [Google Scholar]
- 7.Yang P, Wang D, Zhao W-B, Fu L-H, Du J-L, Su H. Ensemble of kernel extreme learning machine based random forest classifiers for automatic heartbeat classification. Biomed Signal Proc Control. 2021;63:102138. 10.1016/j.bspc.2020.102138. [Google Scholar]
- 8.Wang J. Automated detection of atrial fibrillation and atrial flutter in ecg signals based on convolutional and improved elman neural network. Knowl-Based Syst. 2020;193:105446. 10.1016/j.knosys.2019.105446. [Google Scholar]
- 9.Xin H, Chen Z, Zhuo H, Qinghui C, Shaojie T, Jinshan T, Weihua Z. A novel method for ECG signal classification via one-dimensional convolutional neural network. Multimed Syst. 2020;28:1387–99. 10.1007/s00530-020-00713-1. [Google Scholar]
- 10.Hasan NI, Bhattacharjee A. Deep learning approach to cardiovascular disease classification employing modified ECG signal from empirical mode decomposition. Biomed Signal Process Control. 2019;52:128–40. 10.1016/j.bspc.2019.04.005. [Google Scholar]
- 11.Li X, Zhang F, Sun Z, Li D, Kong X, Zhang Y. Automatic heartbeat classification using s-shaped reconstruction and a squeeze-and-excitation residual network. Comput Biol Med. 2022. 10.1016/j.compbiomed.2021.105108. [DOI] [PubMed] [Google Scholar]
- 12.Li Y, Zhang L, Zhu L, Liu L, Han B, Zhang Y, Wei S. Diagnosis of atrial fibrillation using self-complementary attentional convolutional neural network. Comput Methods Programs Biomed. 2023;238:107565. 10.1016/j.cmpb.2023.107565. [DOI] [PubMed] [Google Scholar]
- 13.Mathunjwa BM, Lin YT, Lin CH, Abbod MF, Sadrawi M, Shieh JS. ECG recurrence plot-based arrhythmia classification using two-dimensional deep residual CNN features. Sensors (Basel). 2022. 10.3390/s22041660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Zhang Y, Li J, Wei S, Zhou F, Li D. Heartbeats classification using hybrid time-frequency analysis and transfer learning based on ResNet. IEEE J Biomed Health Inform. 2021;25(11):4175–84. 10.1109/JBHI.2021.3085318. [DOI] [PubMed] [Google Scholar]
- 15.Ma N, Zhang X, Zheng H-T, Sun J. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, editors. Computer vision—ECCV. Cham: Springer; 2018. p. 122–38. [Google Scholar]
- 16.Frintrop S. Computational visual attention. London: Springer; 2011. p. 69–101. 10.1007/978-0-85729-994-9_4. [Google Scholar]
- 17.Moody GB, Mark RG. The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag. 2001;20(3):45–50. 10.1109/51.932724. [DOI] [PubMed] [Google Scholar]
- 18.Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):215–20. 10.1161/01.cir.101.23.e215. [DOI] [PubMed] [Google Scholar]
- 19.Singh BN, Tiwari AK. Optimal selection of wavelet basis function applied to ECG signal denoising. Digital Signal Processing. 2006;16(3):275–87. 10.1016/J.DSP.2005.12.003. [Google Scholar]
- 20.Wang J-S, Chiang W-C, Yang Y-TC, Hsu Y-L. An effective ECG arrhythmia classification algorithm. In: Huang D-S, Gan Y, Premaratne P, Han K, editors. Bio-inspired computing and applications. Berlin: Springer; 2012. p. 545–50. 10.1007/978-3-642-24553-4_72. [Google Scholar]
- 21.Acharya UR, Krishnan SM. Advances in cardiac recording processing. Cham: Springer; 2007. [Google Scholar]
- 22.El-Saadawy H, Tantawi M, Shedeed HA, Tolba MF. Hybrid hierarchical method for electrocardiogram heartbeat classification. IET Signal Process. 2018;12(4):506–13. 10.1049/iet-spr.2017.0108. [Google Scholar]
- 23.Shaker AM, Tantawi M, Shedeed HA, Tolba MF. Generalization of convolutional neural networks for ecg classification using generative adversarial networks. IEEE Access. 2020;8:35592–605. 10.1109/ACCESS.2020.2974712. [Google Scholar]
- 24.Tao Y, Yue G, Wang K, Zhang Y, Jiang B. A cascaded step-temporal attention network for ECG arrhythmia classification. Int Joint Conf Neural Netw. 2020. 10.1109/IJCNN48605.2020.9206890. [Google Scholar]
- 25.Kuila S, Dhanda N, Joardar S. Ecg signal classification and arrhythmia detection using elm-rnn. Multimed Tools Appl. 2022;81:25233–49. 10.1007/s11042-022-11957-6. [Google Scholar]
- 26.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2015. 10.48550/arXiv.1512.03385
- 27.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. 2015 IEEE Conf Comput Vision Pattern Recognit (CVPR). 2015. 10.1109/CVPR.2015.7298594. [Google Scholar]
- 28.Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning deep features for discriminative localization. 2016 IEEE Conf Comput Vision Pattern Recognit (CVPR). 2016. 10.1109/CVPR.2016.319. [Google Scholar]
- 29.Liu Y, Ji L, Huang R, Ming T, Gao C, Zhang J. An attention-gated convolutional neural network for sentence classification. 2018. 10.48550/arXiv.1808.07325
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Not applicable.
Not applicable.
Not applicable.






