Paper Reading AI Learner

CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing

2024-03-16 03:59:33
Yin Li, Rajalakshmi Nanadakumar

Abstract

Acoustic sensing manifests great potential in various applications that encompass health monitoring, gesture interface and imaging by leveraging the speakers and microphones on smart devices. However, in ongoing research and development in acoustic sensing, one problem is often overlooked: the same speaker, when used concurrently for sensing and other traditional applications (like playing music), could cause interference in both making it impractical to use in the real world. The strong ultrasonic sensing signals mixed with music would overload the speaker's mixer. To confront this issue of overloaded signals, current solutions are clipping or down-scaling, both of which affect the music playback quality and also sensing range and accuracy. To address this challenge, we propose CoPlay, a deep learning based optimization algorithm to cognitively adapt the sensing signal. It can 1) maximize the sensing signal magnitude within the available bandwidth left by the concurrent music to optimize sensing range and accuracy and 2) minimize any consequential frequency distortion that can affect music playback. In this work, we design a deep learning model and test it on common types of sensing signals (sine wave or Frequency Modulated Continuous Wave FMCW) as inputs with various agnostic concurrent music and speech. First, we evaluated the model performance to show the quality of the generated signals. Then we conducted field studies of downstream acoustic sensing tasks in the real world. A study with 12 users proved that respiration monitoring and gesture recognition using our adapted signal achieve similar accuracy as no-concurrent-music scenarios, while clipping or down-scaling manifests worse accuracy. A qualitative study also manifests that the music play quality is not degraded, unlike traditional clipping or down-scaling methods.

Abstract (translated)

声波感知在各种应用中具有很大的潜力,包括健康监测、手势界面和图像感知,通过利用智能设备上的扬声器和麦克风。然而,在声波感知的持续研究和开发中,一个问题常常被忽视:当同一扬声器用于感知和其他传统应用(如播放音乐)时,可能会导致其在现实世界中的干扰,使得它在实际应用中无法使用。强大的超声波感知信号与音乐混合会使扬声器的混频器过载。为了应对过载信号的问题,现有的解决方案是截断或降维,这两者都会影响音乐播放质量和感知范围与准确性。为了应对这个挑战,我们提出了CoPlay,一种基于深度学习的优化算法,以认知地适应感知信号。它可以:1)在可用的带宽范围内最大化感知信号的幅值,以优化感知范围和准确性;2)最小化可能影响音乐播放的任何后续频率畸变。在这篇工作中,我们设计了一个深度学习模型,并将其在各种类型的感知信号(正弦波或频率 modulated 连续波 FMCW)上进行测试,测试各种无关的并发音乐和语音。首先,我们评估了模型的性能,以显示生成的信号的质量。然后,我们在现实世界中对下游声波感知任务进行了现场研究。一个有12个用户的研究表明,使用我们自适应的信号进行呼吸监测和手势识别可以达到与没有同时播放音乐时的相同准确性,而截断或降维则表现出更差的准确性。此外,定性研究还表明,音乐播放质量没有下降,这与传统截断或降维方法不同。

URL

https://arxiv.org/abs/2403.10796

PDF

https://arxiv.org/pdf/2403.10796.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot