evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition

Event-based cameras are bio-inspired sensors that asynchronously capture pixel-intensity changes with microsecond latency, high temporal resolution, and high dynamic range, thereby providing information about spatiotemporal dynamics in the scene. We propose evTransFER, a transfer learning-based framework for facial expression recognition using event-based cameras. The main contribution is a feature extractor designed to encode facial spatiotemporal dynamics, built by training an adversarial generative method on facial reconstruction and transferring the encoder weights to face expression recognition. We show that the proposed transfer-learning method improves facial-expression recognition compared with training a network from scratch. We propose an architecture that incorporates an LSTM to capture longer-term facial expression dynamics and introduces a new event-based representation called TIE. We evaluated the framework using both the synthetic event-based facial expression database e-CK+ and the real neuromorphic dataset NEFER. On e-CK+, evTransFER achieved a recognition rate of 93.6\%, surpassing state-of-the-art methods. For NEFER, which comprises event streams with real sensor noise and sparse activity, the proposed transfer-learning strategy achieved an accuracy of up to 76.7\%. In both datasets, the outcomes surpassed current methodologies and exceeded results when compared to models trained from scratch.

evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition

Abstract

BibTeX