Paper Reading AI Learner

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

2024-05-01 07:44:28
Yoori Oh, Yoseob Han, Kyogu Lee

Abstract

There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques.

Abstract (translated)

音频语言检索研究引起了越来越多的关注,其目标是建立音频和文本模态之间的相关性。然而,大多数音频-文本配对数据集通常缺乏文本数据的丰富表达,与音频样本相比。音频-文本数据集面临的一个关键挑战是,尽管存在不同的音频样本,但存在与音频样本相似或相同的字幕。因此,在许多对一映射条件下,音频-文本数据集导致检索任务的性能较差。在本文中,我们提出了一个新方法来解决音频-语言检索任务中的数据不平衡问题。为了克服这一限制,我们引入了一种基于距离采样 的文本同义词生成方法,利用 ChatGPT,通过距离函数生成可控制文本数据的操纵分布。对于具有相同上下文的句子,距离用于计算任意两个句子之间的 manipulation 程度,而 ChatGPT 的 few-shot 提示通过具有相同距离定义的文本簇进行。因此,当将 ChatGPT 应用于 few-shot 提示与文本簇时,可以根据距离调整被操纵文本的多样性。该方法被证明可以在音频-语言检索中显著增强性能,超过传统文本增强技术。

URL

https://arxiv.org/abs/2405.00367

PDF

https://arxiv.org/pdf/2405.00367.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot