Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

The chain-of-thought technique has been received well in multi-modal tasks. It is a step-by-step linear reasoning process that adjusts the length of the chain to improve the performance of generated prompts. However, human thought processes are predominantly non-linear, as they encompass multiple aspects simultaneously and employ dynamic adjustment and updating mechanisms. Therefore, we propose a novel Aggregation-Graph-of-Thought (AGoT) mechanism for soft-prompt tuning in multi-modal representation learning. The proposed AGoT models the human thought process not only as a chain but also models each step as a reasoning aggregation graph to cope with the overlooked multiple aspects of thinking in single-step reasoning. This turns the entire reasoning process into prompt aggregation and prompt flow operations. Experiments show that our multi-modal model enhanced with AGoT soft-prompting achieves good results in several tasks such as text-image retrieval, visual question answering, and image recognition. In addition, we demonstrate that it has good domain generalization performance due to better reasoning.

Abstract (translated)

链式思考技术在多模态任务中得到了很好的接收。它是一种逐步线性推理过程，根据生成提示的长度调整链条的长度以提高生成提示的性能。然而，人类思维过程主要是非线性的，因为它们同时涵盖多个方面并采用动态调整和更新机制。因此，我们提出了一个名为聚合-图-思维（AGoT）的多模态表示学习软提示调整的新机制。与AGoT不同，我们提出的AGoT模型将人类思维过程不仅建模为链条，而且将每一步都建模为一个推理聚合图，以应对单步推理中忽视的多个方面。这使得整个推理过程转化为提示聚合和提示流操作。实验证明，我们的多模态模型（AGoT软提示）在文本图像检索、视觉问题回答和图像识别等任务中取得了良好的结果。此外，我们还证明了它具有良好的领域泛化性能，因为其推理能力更强。

URL

https://arxiv.org/abs/2404.04538

PDF

https://arxiv.org/pdf/2404.04538.pdf

Soft-Prompting with Graph-of-Thought for Multi-modal Representation Learning

Abstract

Abstract (translated)

URL

PDF Copy

PDF