This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast & efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.
这篇论文在两个主要领域提出了创新贡献:一是提升生成模型(尤其是归一化流)的效率,二是应用生成模型解决实际计算机视觉挑战。第一部分通过六项关键创新对归一化流动架构进行了重大改进: 1) 开发了具有数学证明的必要和充分条件以确保可逆性的可逆3x3卷积层。 2) 引入了一种更高效的四耦合层(Quad-coupling layer)。 3) 设计了一个快速且有效的并行反演算法,用于kxk卷积层。 4) 为卷积的逆向传播开发了高效算法。 5) 在前向传递中使用卷积的逆进行Invert-Flow,并通过提出的反向传播算法进行训练。 6) Affine-StableSR,一个紧凑高效的超分辨率模型,利用预训练权重和归一化流动层来减少参数数量同时保持性能。 第二部分包括: 1) 一种使用条件生成对抗网络(Conditional GANs)的自动质量评估系统,用于解决农业产品中的类别不平衡、数据稀缺和标注挑战,并在种子纯度测试中实现了良好的准确性。 2) 一个利用堆叠自编码器进行降维的无监督地质制图框架,在特征提取方面优于传统方法。 3) 提出了一种使用面部检测和图像修复来保护自动驾驶数据集隐私的方法。 4) 利用基于Stable Diffusion的图像修复技术,将检测到的脸部和车牌替换为提高隐私保护技术和伦理考虑的方法。 5) 一种经过改进的扩散模型用于艺术作品恢复,能够通过统一微调处理多种类型的退化。
https://arxiv.org/abs/2512.04039
Detecting deepfake images is crucial in combating misinformation. We present a lightweight, generalizable binary classification model based on EfficientNet-B6, fine-tuned with transformation techniques to address severe class imbalances. By leveraging robust preprocessing, oversampling, and optimization strategies, our model achieves high accuracy, stability, and generalization. While incorporating Fourier transform-based phase and amplitude features showed minimal impact, our proposed framework helps non-experts to effectively identify deepfake images, making significant strides toward accessible and reliable deepfake detection.
https://arxiv.org/abs/2511.19187