Abstract
Accurate modeling of spatial acoustics is critical for immersive and intelligible audio in confined, resonant environments such as car cabins. Current tuning methods are manual, hardware-intensive, and static, failing to account for frequency selective behaviors and dynamic changes like passenger presence or seat adjustments. To address this issue, we propose INFER: Implicit Neural Frequency Response fields, a frequency-domain neural framework that is jointly conditioned on source and receiver positions, orientations to directly learn complex-valued frequency response fields inside confined, resonant environments like car cabins. We introduce three key innovations over current neural acoustic modeling methods: (1) novel end-to-end frequency-domain forward model that directly learns the frequency response field and frequency-specific attenuation in 3D space; (2) perceptual and hardware-aware spectral supervision that emphasizes critical auditory frequency bands and deemphasizes unstable crossover regions; and (3) a physics-based Kramers-Kronig consistency constraint that regularizes frequency-dependent attenuation and delay. We evaluate our method over real-world data collected in multiple car cabins. Our approach significantly outperforms time- and hybrid-domain baselines on both simulated and real-world automotive datasets, cutting average magnitude and phase reconstruction errors by over 39% and 51%, respectively. INFER sets a new state-of-the-art for neural acoustic modeling in automotive spaces
Abstract (translated)
精确模拟空间声学对于在封闭且共鸣的环境中(如汽车车厢)实现沉浸式和清晰度高的音频体验至关重要。当前的调音方法是手动操作、依赖硬件且静态化的,无法适应频率选择行为以及乘客存在或座位调整等动态变化。为解决这一问题,我们提出了INFER:隐式神经频率响应场模型,这是一种频域神经网络框架,能够根据声源和接收器的位置及方向同时学习封闭共鸣环境(如汽车车厢)内的复杂值频率响应场。 与现有的神经声学建模方法相比,我们在三个关键方面进行了创新: 1. 新型端到端的频域正向模型,该模型直接学习3D空间中的频率响应场和特定于频率的衰减。 2. 具有感知性和硬件意识的光谱监督机制,强调重要的听觉频率带并减弱不稳定交叉区域的影响。 3. 基于物理原理的Kramers-Kronig一致性约束,对频率依赖性衰减和延迟进行正则化处理。 我们使用来自多个汽车车厢的真实世界数据对该方法进行了评估。与时间域及混合域的基线相比,在模拟和真实世界的汽车数据集上,我们的方法在平均幅度重建误差和相位重建误差方面分别减少了39%和51%,显著超越了这些基准模型。INFER为汽车空间内的神经声学建模设立了新的性能标杆。
URL
https://arxiv.org/abs/2510.07442