Paper Reading AI Learner

A Study on FGSM Adversarial Training for Neural Retrieval

2023-01-25 13:28:54
Simon Lupart, Stéphane Clinchant

Abstract

Neural retrieval models have acquired significant effectiveness gains over the last few years compared to term-based methods. Nevertheless, those models may be brittle when faced to typos, distribution shifts or vulnerable to malicious attacks. For instance, several recent papers demonstrated that such variations severely impacted models performances, and then tried to train more resilient models. Usual approaches include synonyms replacements or typos injections -- as data-augmentation -- and the use of more robust tokenizers (characterBERT, BPE-dropout). To further complement the literature, we investigate in this paper adversarial training as another possible solution to this robustness issue. Our comparison includes the two main families of BERT-based neural retrievers, i.e. dense and sparse, with and without distillation techniques. We then demonstrate that one of the most simple adversarial training techniques -- the Fast Gradient Sign Method (FGSM) -- can improve first stage rankers robustness and effectiveness. In particular, FGSM increases models performances on both in-domain and out-of-domain distributions, and also on queries with typos, for multiple neural retrievers.

Abstract (translated)

神经网络检索模型在过去几年中相对于术语方法取得了显著的有效性增益。然而,当面临打字错误、分布迁移或受到恶意攻击等问题时,这些模型可能变得脆性。例如,几篇最近的论文表明,这些变化严重影响了模型性能,然后尝试训练更坚韧的模型。通常的方法包括同义词替换或打字错误注入(作为数据增强)以及使用更可靠的分词器(字符BERT、BPE-dropout)。为了进一步补充文献,我们在本文中探讨对抗训练作为解决这个问题的另一个可能解决方案。我们的比较包括基于BERT的神经网络检索的两个主要家族,即密集和稀疏,以及不使用蒸馏技术的两个版本。然后,我们证明其中一个最简单的对抗训练技术——快速梯度签名方法(FGSM)——可以提高第一阶段排名器的稳健性和有效性。特别是,FGSM可以提高多个神经网络检索器的内部和外部分布以及包含打字错误的查询的性能。

URL

https://arxiv.org/abs/2301.10576

PDF

https://arxiv.org/pdf/2301.10576.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot