Abstract
An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.
Abstract (translated)
环境声学模型表示了室内环境声音如何通过物理特性进行转换,对于任何给定的源/接收器位置。传统构建声学模型的方法包括在空间密集的位置收集大量声学数据,或依赖于场景几何知识来智能选择声学数据采样位置。我们提出了一种主动声学采样,这是一种新任务,用于高效构建未映射环境中的环境声学模型和占用图,其中移动代理器配备视觉和听觉传感器共同构建环境声学模型和占用图。我们引入了ActiveRIR,一种基于音频-视觉传感器流的信息增强强化学习(RL)策略,用于引导代理器导航并确定最优的声学数据采样位置,从而从最小数量的声学样本中产生高质量的环境声学模型。我们用基于环境声学模型信息增益的新颖RL奖励来训练我们的策略。在从最先进的声学仿真平台上的多样未见室内环境中评估,ActiveRIR表现优于基于空间新颖性和视觉探索的传统导航方法和现有最先进的方法。
URL
https://arxiv.org/abs/2404.16216