Paper Reading AI Learner

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

2024-04-10 11:27:06
Xenofon Karakonstantis, Efren Fernandez-Grande, Peter Gerstoft

Abstract

In this study, we introduce a method for estimating sound fields in reverberant environments using a conditional invertible neural network (CINN). Sound field reconstruction can be hindered by experimental errors, limited spatial data, model mismatches, and long inference times, leading to potentially flawed and prolonged characterizations. Further, the complexity of managing inherent uncertainties often escalates computational demands or is neglected in models. Our approach seeks to balance accuracy and computational efficiency, while incorporating uncertainty estimates to tailor reconstructions to specific needs. By training a CINN with Monte Carlo simulations of random wave fields, our method reduces the dependency on extensive datasets and enables inference from sparse experimental data. The CINN proves versatile at reconstructing Room Impulse Responses (RIRs), by acting either as a likelihood model for maximum a posteriori estimation or as an approximate posterior distribution through amortized Bayesian inference. Compared to traditional Bayesian methods, the CINN achieves similar accuracy with greater efficiency and without requiring its adaptation to distinct sound field conditions.

Abstract (translated)

在这项研究中,我们提出了一种使用条件可逆神经网络(CINN)估算回声环境中的声场的方法。声场重构可能会受到实验误差、有限的空间数据、模型不匹配和长推理时间的限制,从而可能导致可能存在缺陷和延长的描述。此外,管理固有不确定性的复杂性通常会降低计算需求,或者在模型中而被忽略。我们的方法旨在平衡准确性和计算效率,并引入不确定性估计来定制重构以满足特定需求。通过用随机波场蒙特卡洛模拟训练CINN,我们的方法减少了依赖于丰富数据集的依赖性,并使推理从稀疏实验数据中进行。CINN在重构室脉冲响应(RIRs)时表现出了多才多艺,它可以作为最大后验估计的概率模型,也可以作为通过折现贝叶斯推理的近似后验分布。与传统贝叶斯方法相比,CINN在实现相同准确性的同时具有更高的效率,并且不需要将其适应于不同的声场条件。

URL

https://arxiv.org/abs/2404.06928

PDF

https://arxiv.org/pdf/2404.06928.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot