Paper Reading AI Learner

A Hybrid Transformer-Sequencer approach for Age and Gender classification from in-wild facial images

2024-03-19 06:40:06
Aakash Singh, Vivek Kumar Singh

Abstract

The advancements in computer vision and image processing techniques have led to emergence of new application in the domain of visual surveillance, targeted advertisement, content-based searching, and human-computer interaction etc. Out of the various techniques in computer vision, face analysis, in particular, has gained much attention. Several previous studies have tried to explore different applications of facial feature processing for a variety of tasks, including age and gender classification. However, despite several previous studies having explored the problem, the age and gender classification of in-wild human faces is still far from the achieving the desired levels of accuracy required for real-world applications. This paper, therefore, attempts to bridge this gap by proposing a hybrid model that combines self-attention and BiLSTM approaches for age and gender classification problems. The proposed models performance is compared with several state-of-the-art model proposed so far. An improvement of approximately 10percent and 6percent over the state-of-the-art implementations for age and gender classification, respectively, are noted for the proposed model. The proposed model is thus found to achieve superior performance and is found to provide a more generalized learning. The model can, therefore, be applied as a core classification component in various image processing and computer vision problems.

Abstract (translated)

计算机视觉和图像处理技术的进步在视觉监视、定向广告、内容搜索和人与计算机交互等领域导致了新的应用的出现。在计算机视觉的各种技术中,特别是面部特征分析,已经引起了很多关注。之前的研究已经尝试探索面部特征处理在各种任务中的应用,包括年龄和性别分类。然而,尽管有几次之前的研究探索了这个问题,但野外人类脸部的年龄和性别分类仍然离实现真实应用所需的精度水平还有很长的路要走。因此,本文试图通过提出一种混合模型来弥合这个差距,该模型结合了自注意力机制和BiLSTM方法来进行年龄和性别分类问题。所提出的模型的性能与目前最先进的模型进行了比较。与年龄和性别分类的现有实现相比,分别提高了约10%和6%的性能。因此,本文提出的方法被认为实现了卓越的性能,并且具有更广泛的泛化学习。因此,该模型可以作为一个核心分类组件应用于各种图像处理和计算机视觉问题。

URL

https://arxiv.org/abs/2403.12483

PDF

https://arxiv.org/pdf/2403.12483.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot