Alexithymia has been associated with emotion recognition deficits in both auditory and visual domains. Although emotions are inherently multimodal in daily life, little is known regarding abnormalities of emotional multisensory integration (eMSI) in relation to alexithymia. Here, we employed an emotional Stroop-like audiovisual task while recording event-related potentials (ERPs) in individuals with high alexithymia levels (HA) and low alexithymia levels (LA). During the task, participants had to indicate whether a voice was spoken in a sad or angry prosody while ignoring the simultaneously presented static face which could be either emotionally congruent or incongruent to the human voice. We found that HA performed worse and showed higher P2 amplitudes than LA independent of emotion congruency. Furthermore, difficulties in identifying and describing feelings were positively correlated with the P2 component, and P2 correlated negatively with behavioral performance. Bayesian statistics showed no group differences in eMSI and classical integration-related ERP components (N1 and N2). Although individuals with alexithymia indeed showed deficits in auditory emotion recognition as indexed by decreased performance and higher P2 amplitudes, the present findings suggest an intact capacity to integrate emotional information from multiple channels in alexithymia. Our work provides valuable insights into the relationship between alexithymia and neuropsychological mechanisms of emotional multisensory integration.