Paper Reading AI Learner

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

2025-05-22 17:54:30
Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, Mario Trapp

Abstract

Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. While prior research has primarily focused on unimodal image data, real-world applications are inherently multimodal, requiring the integration of multiple modalities for improved OOD detection. A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for multimodal outlier synthesis with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. Feature Mixing is modality-agnostic and applicable to various modality combinations. Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. Extensive experiments on SemanticKITTI, nuScenes, CARLA-OOD datasets, and the MultiOOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a $10 \times$ to $370 \times$ speedup. Our source code and dataset will be available at this https URL.

Abstract (translated)

出界(Out-of-distribution,OOD)检测和分割对于在自动驾驶和机器人辅助手术等安全关键应用中部署机器学习模型至关重要。尽管之前的大多数研究主要集中在单模态图像数据上,但现实世界的应用本质上是多模态的,需要整合多种模态以提高OOD检测的效果。一个关键挑战是没有来自未知数据的监督信号,导致模型在处理OOD样本时过于自信。为解决这一挑战,我们提出了特征混合(Feature Mixing)方法,这是一种极其简单且快速的方法,用于生成具有理论支持的多模态异常值,可以通过进一步优化帮助模型更好地区分已知分布(in-distribution,ID)和OOD数据。特征混合与模式无关,并适用于各种模态组合。 此外,我们还介绍了CARLA-OOD,这是一个新颖的多模态数据集,用于OOD分割任务,其中包含在不同场景和天气条件下合成的OOD物体。在SemanticKITTI、nuScenes、CARLA-OOD以及MultiOOD基准测试上进行的大量实验表明,特征混合方法能够实现最先进的性能,并且速度提高了10倍到370倍。我们的源代码和数据集将在[此处](https://this https URL)提供。 该段落翻译为中文后清晰地介绍了研究背景、提出的方法及其优势,以及用于验证新方法的数据集和实验结果。

URL

https://arxiv.org/abs/2505.16985

PDF

https://arxiv.org/pdf/2505.16985.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot