Abstract
Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
Abstract (translated)
基于多模态数据的双手势识别(HGR)引起了相当大的关注,因为其在应用领域具有很大的潜力。 various手工设计的多模态深度网络在多模态HGR(MHGR)表现良好,但现有的算法需要很多专家经验和费时费力的手动尝试。为了解决这些问题,我们提出了一个基于自适应多模态融合(AMF-ENAS)的进化网络架构搜索框架。具体来说,我们设计了一个编码空间,同时考虑多模态数据的融合位置和比值,允许通过解码自动构建具有不同架构的 multimodal 网络。此外,我们考虑三个输入流,分别是对模态表面电生理(sEMG)、对模态加速度计(ACC)和跨模态 sEMG-ACC。为了自动适应各种数据集,ENAS 框架被设计为自动搜索具有适当融合位置和比值的 MHGR 网络。据我们所知,这是 ENAS 首次用于解决多模态数据融合位置和比值的问题。实验结果表明,AMF-ENAS 在 Ninapro DB2、DB3 和 DB7 数据集上取得了最先进的性能。
URL
https://arxiv.org/abs/2403.18208