Multimodal emotion recognition based on peak frame selection from video


Zhalehpour S., Akhtar Z., Erdem Ç.

SIGNAL IMAGE AND VIDEO PROCESSING, vol.10, no.5, pp.827-834, 2016 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 10 Issue: 5
  • Publication Date: 2016
  • Doi Number: 10.1007/s11760-015-0822-0
  • Journal Name: SIGNAL IMAGE AND VIDEO PROCESSING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.827-834
  • Keywords: Affective computing, Facial expression recognition, Apex frame, Audio-visual emotion recognition, FUSION
  • Marmara University Affiliated: Yes

Abstract

We present a fully automatic multimodal emotion recognition system based on three novel peak frame selection approaches using the video channel. Selection of peak frames (i.e., apex frames) is an important preprocessing step for facial expression recognition as they contain the most relevant information for classification. Two of the three proposed peak frame selection methods (i.e., MAXDIST and DEND-CLUSTER) do not employ any training or prior learning. The third method proposed for peak frame selection (i.e., EIFS) is based on measuring the "distance" of the expressive face from the subspace of neutral facial expression, which requires a prior learning step to model the subspace of neutral face shapes. The audio and video modalities are fused at the decision level. The subject-independent audio-visual emotion recognition system has shown promising results on two databases in two different languages (eNTERFACE and BAUM-1a).