Paper Reading AI Learner

Isotropic Representation Can Improve Dense Retrieval

2022-09-01 04:29:38
Euna Jung, Jungwon Park, Jaekeol Choi, Sungyoon Kim, Wonjong Rhee

Abstract

The recent advancement in language representation modeling has broadly affected the design of dense retrieval models. In particular, many of the high-performing dense retrieval models evaluate representations of query and document using BERT, and subsequently apply a cosine-similarity based scoring to determine the relevance. BERT representations, however, are known to follow an anisotropic distribution of a narrow cone shape and such an anisotropic distribution can be undesirable for the cosine-similarity based scoring. In this work, we first show that BERT-based DR also follows an anisotropic distribution. To cope with the problem, we introduce unsupervised post-processing methods of Normalizing Flow and whitening, and develop token-wise method in addition to the sequence-wise method for applying the post-processing methods to the representations of dense retrieval models. We show that the proposed methods can effectively enhance the representations to be isotropic, then we perform experiments with ColBERT and RepBERT to show that the performance (NDCG at 10) of document re-ranking can be improved by 5.17\%$\sim$8.09\% for ColBERT and 6.88\%$\sim$22.81\% for RepBERT. To examine the potential of isotropic representation for improving the robustness of DR models, we investigate out-of-distribution tasks where the test dataset differs from the training dataset. The results show that isotropic representation can achieve a generally improved performance. For instance, when training dataset is MS-MARCO and test dataset is Robust04, isotropy post-processing can improve the baseline performance by up to 24.98\%. Furthermore, we show that an isotropic model trained with an out-of-distribution dataset can even outperform a baseline model trained with the in-distribution dataset.

Abstract (translated)

URL

https://arxiv.org/abs/2209.00218

PDF

https://arxiv.org/pdf/2209.00218.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot