Paper Reading AI Learner

DroneSR: Rethinking Few-shot Thermal Image Super-Resolution from Drone-based Perspective

2025-09-02 02:37:42
Zhipeng Weng, Xiaopeng Liu, Ce Liu, Xingyuan Guo, Yukai Shi, Liang Lin

Abstract

Although large scale models achieve significant improvements in performance, the overfitting challenge still frequently undermines their generalization ability. In super resolution tasks on images, diffusion models as representatives of generative models typically adopt large scale architectures. However, few-shot drone-captured infrared training data frequently induces severe overfitting in large-scale architectures. To address this key challenge, our method proposes a new Gaussian quantization representation learning method oriented to diffusion models that alleviates overfitting and enhances robustness. At the same time, an effective monitoring mechanism tracks large scale architectures during training to detect signs of overfitting. By introducing Gaussian quantization representation learning, our method effectively reduces overfitting while maintaining architecture complexity. On this basis, we construct a multi source drone-based infrared image benchmark dataset for detection and use it to emphasize overfitting issues of large scale architectures in few sample, drone-based diverse drone-based image reconstruction scenarios. To verify the efficacy of the method in mitigating overfitting, experiments are conducted on the constructed benchmark. Experimental results demonstrate that our method outperforms existing super resolution approaches and significantly mitigates overfitting of large scale architectures under complex conditions. The code and DroneSR dataset will be available at: this https URL.

Abstract (translated)

尽管大规模模型在性能上取得了显著的改进,但过拟合的问题依然频繁地损害了它们的泛化能力。特别是在图像超分辨率任务中,作为生成式模型代表的扩散模型通常采用大规模架构。然而,在使用少量无人机捕捉到的红外训练数据的情况下,这类大型架构经常会引发严重的过拟合问题。为解决这一关键挑战,我们的方法提出了一种针对扩散模型的新型高斯量化表征学习方法,以减轻过拟合并增强鲁棒性。同时,我们设计了一个有效的监控机制,在训练过程中追踪大规模架构,以便及时发现过拟合的迹象。通过引入高斯量化表征学习,我们的方法在不增加架构复杂度的情况下有效减少了过拟合现象。 在此基础上,我们构建了一个基于多源无人机红外图像的数据集(用于检测),并在该数据集中突出展示了大规模架构在少样本、多样化无人机捕捉场景中的重建任务中所面临的过拟合问题。为了验证该方法缓解过拟合的有效性,在构造的基准测试上进行了实验。实验结果表明,我们的方法优于现有的超分辨率方法,并显著减轻了复杂条件下的大规模模型的过拟合现象。 相关代码和DroneSR数据集可在以下链接获取:[提供具体的网址]

URL

https://arxiv.org/abs/2509.01898

PDF

https://arxiv.org/pdf/2509.01898.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot