Paper Reading AI Learner

OpenFace 3.0: A Lightweight Multitask System for Comprehensive Facial Behavior Analysis

2025-06-03 13:56:10
Jiewen Hu, Leena Mathur, Paul Pu Liang, Louis-Philippe Morency

Abstract

In recent years, there has been increasing interest in automatic facial behavior analysis systems from computing communities such as vision, multimodal interaction, robotics, and affective computing. Building upon the widespread utility of prior open-source facial analysis systems, we introduce OpenFace 3.0, an open-source toolkit capable of facial landmark detection, facial action unit detection, eye-gaze estimation, and facial emotion recognition. OpenFace 3.0 contributes a lightweight unified model for facial analysis, trained with a multi-task architecture across diverse populations, head poses, lighting conditions, video resolutions, and facial analysis tasks. By leveraging the benefits of parameter sharing through a unified model and training paradigm, OpenFace 3.0 exhibits improvements in prediction performance, inference speed, and memory efficiency over similar toolkits and rivals state-of-the-art models. OpenFace 3.0 can be installed and run with a single line of code and operate in real-time without specialized hardware. OpenFace 3.0 code for training models and running the system is freely available for research purposes and supports contributions from the community.

Abstract (translated)

近年来,计算领域的视觉、多模态交互、机器人和情感计算社区对自动面部行为分析系统产生了越来越浓厚的兴趣。基于先前开源的面部分析系统的广泛应用,我们推出了OpenFace 3.0,这是一个能够进行面部标志点检测、面部动作单元检测、眼动估计以及面部情绪识别的开源工具包。 OpenFace 3.0 提供了一种轻量级的统一模型,用于面部分析,并且该模型通过多样化的人群样本、头部姿态、光照条件、视频分辨率和不同的面部分析任务进行了跨任务架构的训练。通过利用统一模型及训练范式中的参数共享带来的优势,OpenFace 3.0 在预测性能、推理速度以及内存效率方面超越了类似的工具包,并且在与现有先进模型的比较中表现出色。 OpenFace 3.0 可以用单行代码安装并运行,并能在没有专门硬件的情况下实时操作。用于训练模型和运行系统的 OpenFace 3.0 代码免费提供给研究目的使用,并支持社区贡献。

URL

https://arxiv.org/abs/2506.02891

PDF

https://arxiv.org/pdf/2506.02891.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot