Abstract
In patent prosecution, image-based retrieval systems for identifying similarities between current patent images and prior art are pivotal to ensure the novelty and non-obviousness of patent applications. Despite their growing popularity in recent years, existing attempts, while effective at recognizing images within the same patent, fail to deliver practical value due to their limited generalizability in retrieving relevant prior art. Moreover, this task inherently involves the challenges posed by the abstract visual features of patent images, the skewed distribution of image classifications, and the semantic information of image descriptions. Therefore, we propose a language-informed, distribution-aware multimodal approach to patent image feature learning, which enriches the semantic understanding of patent image by integrating Large Language Models and improves the performance of underrepresented classes with our proposed distribution-aware contrastive losses. Extensive experiments on DeepPatent2 dataset show that our proposed method achieves state-of-the-art or comparable performance in image-based patent retrieval with mAP +53.3%, Recall@10 +41.8%, and MRR@10 +51.9%. Furthermore, through an in-depth user analysis, we explore our model in aiding patent professionals in their image retrieval efforts, highlighting the model's real-world applicability and effectiveness.
Abstract (translated)
在专利申请过程中,基于图像的检索系统用于识别当前专利图像与先驱技术的相似性,以确保专利申请的新颖性和非显而易见性。尽管近年来它在同一专利中识别图像的有效性得到了显著提高,但现有的尝试虽然能在同一专利中识别图像,但由于其在提取相关先驱技术方面具有有限的可扩展性,导致实际应用价值有限。此外,这项任务本身涉及专利图像抽象视觉特征、图像分类分布的不对称性和图像描述的语义信息等挑战。因此,我们提出了一个语言驱动、分布关注的多模态方法来进行专利图像特征学习,通过整合大型语言模型和改进我们提出的分布关注对比损失,从而丰富专利图像的语义理解。在DeepPatent2数据集上的大量实验证明,我们提出的方法在基于图像的专利检索中实现了与mAP +53.3%和Recall@10 +41.8%相当或更好的性能。通过深入的用户分析,我们探讨了我们的模型如何帮助专利专业人员提高图像检索工作,突出了模型的实际应用价值和有效性。
URL
https://arxiv.org/abs/2404.19360