Abstract
Keyword Spotting (KWS) systems with small footprint models deployed on edge devices face significant accuracy and robustness challenges due to domain shifts caused by varying noise and recording conditions. To address this, we propose a comprehensive framework for continual learning designed to adapt to new domains while maintaining computational efficiency. The proposed pipeline integrates a dual-input Convolutional Neural Network, utilizing both Mel Frequency Cepstral Coefficients (MFCC) and Mel-spectrogram features, supported by a multi-stage denoising process, involving discrete wavelet transform and spectral subtraction techniques, plus model and prototype update blocks. Unlike prior methods that restrict updates to specific layers, our approach updates the complete quantized model, made possible due to compact model architecture. A subset of input samples are selected during runtime using class prototypes and confidence-driven filtering, which are then pseudo-labeled and combined with rehearsal buffer for incremental model retraining. Experimental results on noisy test dataset demonstrate the framework's effectiveness, achieving 99.63\% accuracy on clean data and maintaining robust performance (exceeding 94\% accuracy) across diverse noisy environments, even at -10 dB Signal-to-Noise Ratio. The proposed framework work confirms that integrating efficient denoising with prototype-based continual learning enables KWS models to operate autonomously and robustly in resource-constrained, dynamic environments.
Abstract (translated)
关键词识别(KWS)系统在边缘设备上部署的小型模型面临着由于噪声和录音条件变化导致的领域偏移所引起的准确性和鲁棒性挑战。为了解决这些问题,我们提出了一种全面的连续学习框架,旨在适应新的领域同时保持计算效率。该提议的流程集成了一个双输入卷积神经网络(CNN),利用梅尔频率倒谱系数(MFCC)和梅尔频谱图特征,并结合多级去噪过程,包括离散小波变换和频谱减法技术以及模型更新和原型更新模块。 与以前的方法仅限于特定层的更新不同,我们的方法更新整个量化模型,这得益于紧凑型模型架构。在运行时使用类原型和基于置信度的过滤器选择输入样本的一部分,在这些选定的样本上添加伪标签,并将其与回放缓冲区结合以进行增量模型重训练。 实验结果表明,在嘈杂的数据测试集中该框架的有效性:对于干净数据,准确率达到了99.63%,并且即使在-10 dB信噪比的情况下,也能保持稳健性能(超过94%的准确性),适用于各种噪音环境。这项工作证明了结合高效的去噪技术与基于原型的连续学习可以使KWS模型能够在资源受限和动态环境中自主且鲁棒地运行。
URL
https://arxiv.org/abs/2601.16158