Paper Reading AI Learner

BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

2023-03-22 09:33:50
Shuo Yang, Zhaopan Xu, Kai Wang, Yang You, Hongxun Yao, Tongliang Liu, Min Xu

Abstract

As one of the most fundamental techniques in multimodal learning, cross-modal matching aims to project various sensory modalities into a shared feature space. To achieve this, massive and correctly aligned data pairs are required for model training. However, unlike unimodal datasets, multimodal datasets are extremely harder to collect and annotate precisely. As an alternative, the co-occurred data pairs (e.g., image-text pairs) collected from the Internet have been widely exploited in the area. Unfortunately, the cheaply collected dataset unavoidably contains many mismatched data pairs, which have been proven to be harmful to the model's performance. To address this, we propose a general framework called BiCro (Bidirectional Cross-modal similarity consistency), which can be easily integrated into existing cross-modal matching models and improve their robustness against noisy data. Specifically, BiCro aims to estimate soft labels for noisy data pairs to reflect their true correspondence degree. The basic idea of BiCro is motivated by that -- taking image-text matching as an example -- similar images should have similar textual descriptions and vice versa. Then the consistency of these two similarities can be recast as the estimated soft labels to train the matching model. The experiments on three popular cross-modal matching datasets demonstrate that our method significantly improves the noise-robustness of various matching models, and surpass the state-of-the-art by a clear margin.

Abstract (translated)

作为一种在多模态学习中最为重要的技术,跨模态匹配旨在将各种感官模态放到一个共享特征空间中。为了实现这一点,需要大量的、正确地对齐的数据对来进行模型训练。然而,与单模态数据集不同,多模态数据集是非常难以精确收集和标注的。作为一种替代方案,从互联网上收集的共现数据对(如图像-文本对)在该地区被广泛利用。不幸的是,低成本收集的数据集不可避免地包含了许多不匹配的数据对,这些证明对模型性能有害。为了解决这一问题,我们提出了一个名为BiCro(Bidirectional Cross-modal Similarity consistency)的一般性框架,它可以轻松地集成到现有的跨模态匹配模型中,并改进其对噪声数据的稳定性。具体而言,BiCro旨在估计噪声数据对的软标签,以反映它们的真正匹配程度。BiCro的基本思想是由——以图像-文本对为例——相似的图像应该具有相似的文本描述,反之亦然。然后,这两个相似性的一致性可以重新转换为估计的软标签,用于训练匹配模型。对三个流行的跨模态匹配数据集的实验表明,我们的方法显著改进了各种匹配模型的噪声稳定性,并以明显的优势超越了当前的技术水平。

URL

https://arxiv.org/abs/2303.12419

PDF

https://arxiv.org/pdf/2303.12419.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot