Abstract
Batteries are essential for various applications, including electric vehicles and renewable energy storage, making safety and efficiency critical concerns. Anomaly detection in battery thermal images helps identify failures early, but traditional deep learning methods require extensive labeled data, which is difficult to obtain, especially for anomalies due to safety risks and high data collection costs. To overcome this, we explore zero-shot anomaly detection using Visual Question Answering (VQA) models, which leverage pretrained knowledge and textbased prompts to generalize across vision tasks. By incorporating prior knowledge of normal battery thermal behavior, we design prompts to detect anomalies without battery-specific training data. We evaluate three VQA models (ChatGPT-4o, LLaVa-13b, and BLIP-2) analyzing their robustness to prompt variations, repeated trials, and qualitative outputs. Despite the lack of finetuning on battery data, our approach demonstrates competitive performance compared to state-of-the-art models that are trained with the battery data. Our findings highlight the potential of VQA-based zero-shot learning for battery anomaly detection and suggest future directions for improving its effectiveness.
Abstract (translated)
电池对于电动汽车和可再生能源存储等各类应用至关重要,因此安全性和效率成为了关键问题。在电池热图像中进行异常检测有助于提前发现故障,但传统的深度学习方法需要大量标注数据,这些数据由于安全性风险及高昂的数据采集成本而难以获得。为解决这一难题,我们探索了利用视觉问答(VQA)模型进行零样本异常检测的方法,这种方法通过使用预训练的知识和基于文本的提示来在不同的视觉任务中实现泛化。结合正常的电池热行为先验知识,我们设计出可以不依赖于特定电池数据训练的提示以识别异常。 我们在三个VQA模型(ChatGPT-4o、LLaVa-13b 和 BLIP-2)上进行了评估,分析了它们对不同提示变化的鲁棒性以及重复实验的结果和定性输出。尽管这些模型没有针对电池数据进行微调,但我们的方法展示了与最先进的已训练电池数据的模型相比具有竞争力的表现。本研究结果突显了基于VQA的零样本学习在电池异常检测中的潜力,并提出了未来改进其有效性的方向。
URL
https://arxiv.org/abs/2505.16674