Abstract:
Objective emotion recognition is significantly important for some fields such as physiological health, healthcare and education. From the perspectives of signal acquisition difficulty, cost and user acceptance, the electrocardiogram (ECG) signals are appropriate biomarkers for achieving objective emotion recognition. As it is difficult for deep-learning based methods to extract and fuse the spatio-temporal features of one-dimensional ECG signals, a 1D-2D signal transformation method based on wavelet packet decomposition was proposed firstly, which converted one-dimensional ECG signals to "two-dimensional images". Subsequently, the ResNet18 was used as the backbone network and the "two-dimensional images" were used as its input, where a Fusion Block module was designed to improve the network's spatio-temporal feature extraction and fusion capabilities. Finally, extensive experiments were implemented for the emotion recognition task on the WESAD and SWELL-KW datasets. The experimental results demonstrated that in comparison with the suboptimal method, the emotion recognition method proposed in this paper improved the performance in both metrics of average accuracy and F1 scores by 2.19 and 4.48 percentage points respectively, which may provide the technical support for objective emotion recognition.