Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Abstract
Abstract (translated)
URL
PDF

Abstract

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a boosting manner to seek a more powerful matching model. Specifically, we propose a brand-new Deep Boosting Learning (DBL) algorithm, where an anchor branch is first trained to provide insights into the data properties, with a target branch gaining more advanced knowledge to develop optimal features and distance metrics. Concretely, an anchor branch initially learns the absolute or relative distance between positive and negative pairs, providing a foundational understanding of the particular network and data distribution. Building upon this knowledge, a target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples. Extensive experiments validate that our DBL can achieve impressive and consistent improvements based on various recent state-of-the-art models in the image-text matching field, and outperform related popular cooperative strategies, e.g., Conventional Distillation, Mutual Learning, and Contrastive Learning. Beyond the above, we confirm that DBL can be seamlessly integrated into their training scenarios and achieve superior performance under the same computational costs, demonstrating the flexibility and broad applicability of our proposed method. Our code is publicly available at: this https URL.

Abstract (translated)

图像文本匹配仍然是一个具有挑战性的任务，因为不同模态之间存在异质语义多样性，三元组内的距离分离度不足。与之前关注增强多模态表示或利用跨模态对应进行更准确检索的前沿方法不同，本文旨在以一种更有力的匹配模型的方式利用同侪分支之间的知识传递，寻找一个更强大的匹配模型。具体来说，我们提出了一个全新的Deep Boosting Learning（DBL）算法，其中锚支首先通过训练获得对数据属性的见解，目标支则获得更丰富的知识以开发最优的特征和距离度量。具体，锚支最初学习正负对之间的绝对或相对距离，为特定网络和数据分布提供基础理解。在此基础上，目标支同时负担具有自适应边界的扩展，进一步扩大匹配样本之间的相对距离。大量实验证实，我们的DBL可以在各种图像文本匹配领域的最新模型上实现令人印象深刻的改进，并优于相关的流行合作策略，例如传统扩散、相互学习和对比学习。此外，我们还证实，DBL可以轻松地集成到他们的训练场景中，在相同的计算成本下实现卓越的性能，证明了我们所提出方法的可行性和广泛的适用性。我们的代码可公开访问于：https:// this URL。

URL

https://arxiv.org/abs/2404.18114

PDF

https://arxiv.org/pdf/2404.18114.pdf

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Abstract

Abstract (translated)

URL

PDF Copy

PDF