Paper Reading AI Learner

INFER : Learning Implicit Neural Frequency Response Fields for Confined Car Cabin

2025-10-08 18:43:12
Harshvardhan C. Takawale, Nirupam Roy, Phil Brown

Abstract

Accurate modeling of spatial acoustics is critical for immersive and intelligible audio in confined, resonant environments such as car cabins. Current tuning methods are manual, hardware-intensive, and static, failing to account for frequency selective behaviors and dynamic changes like passenger presence or seat adjustments. To address this issue, we propose INFER: Implicit Neural Frequency Response fields, a frequency-domain neural framework that is jointly conditioned on source and receiver positions, orientations to directly learn complex-valued frequency response fields inside confined, resonant environments like car cabins. We introduce three key innovations over current neural acoustic modeling methods: (1) novel end-to-end frequency-domain forward model that directly learns the frequency response field and frequency-specific attenuation in 3D space; (2) perceptual and hardware-aware spectral supervision that emphasizes critical auditory frequency bands and deemphasizes unstable crossover regions; and (3) a physics-based Kramers-Kronig consistency constraint that regularizes frequency-dependent attenuation and delay. We evaluate our method over real-world data collected in multiple car cabins. Our approach significantly outperforms time- and hybrid-domain baselines on both simulated and real-world automotive datasets, cutting average magnitude and phase reconstruction errors by over 39% and 51%, respectively. INFER sets a new state-of-the-art for neural acoustic modeling in automotive spaces

Abstract (translated)

精确模拟空间声学对于在封闭且共鸣的环境中(如汽车车厢)实现沉浸式和清晰度高的音频体验至关重要。当前的调音方法是手动操作、依赖硬件且静态化的,无法适应频率选择行为以及乘客存在或座位调整等动态变化。为解决这一问题,我们提出了INFER:隐式神经频率响应场模型,这是一种频域神经网络框架,能够根据声源和接收器的位置及方向同时学习封闭共鸣环境(如汽车车厢)内的复杂值频率响应场。 与现有的神经声学建模方法相比,我们在三个关键方面进行了创新: 1. 新型端到端的频域正向模型,该模型直接学习3D空间中的频率响应场和特定于频率的衰减。 2. 具有感知性和硬件意识的光谱监督机制,强调重要的听觉频率带并减弱不稳定交叉区域的影响。 3. 基于物理原理的Kramers-Kronig一致性约束,对频率依赖性衰减和延迟进行正则化处理。 我们使用来自多个汽车车厢的真实世界数据对该方法进行了评估。与时间域及混合域的基线相比,在模拟和真实世界的汽车数据集上,我们的方法在平均幅度重建误差和相位重建误差方面分别减少了39%和51%,显著超越了这些基准模型。INFER为汽车空间内的神经声学建模设立了新的性能标杆。

URL

https://arxiv.org/abs/2510.07442

PDF

https://arxiv.org/pdf/2510.07442.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot