This study utilizes graph theory and deep learning to assess variations in Alzheimer's disease (AD) neuropathologies, focusing on classic (cAD) and rapid (rpAD) progression forms. It analyses the distribution of amyloid plaques and tau tangles in postmortem brain tissues. Histopathological images are converted into tau-pathology-based graphs, and derived metrics are used for statistical analysis and in machine learning classifiers. These classifiers incorporate SHAP value explainability to differentiate between cAD and rpAD. Graph neural networks (GNNs) demonstrate greater efficiency than traditional CNN methods in analyzing this data, preserving spatial pathology context. Additionally, GNNs provide significant insights through explainable AI techniques. The analysis shows denser networks in rpAD and a distinctive impact on brain cortical layers: rpAD predominantly affects middle layers, whereas cAD influences both superficial and deep layers of the same cortical regions. These results suggest a unique neuropathological network organization for each AD variant.
The increasing use of digital technologies and mobile-based registration procedures highlights the vital role of personal identity documents (IDs) in verifying users and safeguarding sensitive information. However, the rise in counterfeit ID production poses a significant challenge, necessitating the development of reliable and efficient automated verification methods. This paper introduces IDTrust, a deep-learning framework for assessing the quality of IDs. IDTrust is a system that enhances the quality of identification documents by using a deep learning-based approach. This method eliminates the need for relying on original document patterns for quality checks and pre-processing steps for alignment. As a result, it offers significant improvements in terms of dataset applicability. By utilizing a bandpass filtering-based method, the system aims to effectively detect and differentiate ID quality. Comprehensive experiments on the MIDV-2020 and L3i-ID datasets identify optimal parameters, significantly improving discrimination performance and effectively distinguishing between original and scanned ID documents.
Deep learning-based methods have achieved prestigious performance for magnetic resonance imaging (MRI) reconstruction, enabling fast imaging for many clinical applications. Previous methods employ convolutional networks to learn the image prior as the regularization term. In quantitative MRI, the physical model of nuclear magnetic resonance relaxometry is known, providing additional prior knowledge for image reconstruction. However, traditional reconstruction networks are limited to learning the spatial domain prior knowledge, ignoring the relaxometry prior. Therefore, we propose a relaxometry-guided quantitative MRI reconstruction framework to learn the spatial prior from data and the relaxometry prior from MRI physics. Additionally, we also evaluated the performance of two popular reconstruction backbones, namely, recurrent variational networks (RVN) and variational networks (VN) with U- Net. Experiments demonstrate that the proposed method achieves highly promising results in quantitative MRI reconstruction.
This study presents a novel approach for reconstructing cone beam computed tomography (CBCT) for specific orbits using known operator learning. Unlike traditional methods, this technique employs a filtered backprojection type (FBP-type) algorithm, which integrates a unique, adaptive filtering process. This process involves a series of operations, including weightings, differentiations, the 2D Radon transform, and backprojection. The filter is designed for a specific orbit geometry and is obtained using a data-driven approach based on deep learning. The approach efficiently learns and optimizes the orbit-related component of the filter. The method has demonstrated its ability through experimentation by successfully learning parameters from circular orbit projection data. Subsequently, the optimized parameters are used to reconstruct images, resulting in outcomes that closely resemble the analytical solution. This demonstrates the potential of the method to learn appropriate parameters from any specific orbit projection data and achieve reconstruction. The algorithm has demonstrated improvement, particularly in enhancing reconstruction speed and reducing memory usage for handling specific orbit reconstruction.
In this paper, we propose a deep learning based model for Acoustic Anomaly Detection of Machines, the task for detecting abnormal machines by analysing the machine sound. By conducting extensive experiments, we indicate that multiple techniques of pseudo audios, audio segment, data augmentation, Mahalanobis distance, and narrow frequency bands, which mainly focus on feature engineering, are effective to enhance the system performance. Among the evaluating techniques, the narrow frequency bands presents a significant impact. Indeed, our proposed model, which focuses on the narrow frequency bands, outperforms the DCASE baseline on the benchmark dataset of DCASE 2022 Task 2 Development set. The important role of the narrow frequency bands indicated in this paper inspires the research community on the task of Acoustic Anomaly Detection of Machines to further investigate and propose novel network architectures focusing on the frequency bands.
在本文中，我们提出了一个基于深度学习的机器异常检测模型，该任务是通过分析机器声音来检测异常机器。通过进行广泛的实验，我们表明多种伪音频技术、音频段、数据增强、马兰诺比距离和窄带宽技术，主要关注特征工程，可以有效提高系统性能。在评估技术中，窄带宽技术表现出显著的影响。事实上，我们提出的模型，重点在窄带宽上，在 DCASE 2022 任务 2 开发集的基准数据集上超过了 DCASE 基线。本文中窄带宽技术的重要作用启发了研究社区，使他们进一步研究机器异常检测任务，并提出关注频带的新的网络架构。
Multimodal data analysis and validation based on streams from state-of-the-art sensor technology such as eye-tracking or emotion recognition using the Facial Action Coding System (FACTs) with deep learning allows educational researchers to study multifaceted learning and problem-solving processes and to improve educational experiences. This study aims to investigate the correlation between two continuous sensor streams, pupil diameter as an indicator of cognitive workload and FACTs with deep learning as an indicator of emotional arousal (RQ 1a), specifically for epochs of high, medium, and low arousal (RQ 1b). Furthermore, the time lag between emotional arousal and pupil diameter data will be analyzed (RQ 2). 28 participants worked on three cognitively demanding and emotionally engaging everyday moral dilemmas while eye-tracking and emotion recognition data were collected. The data were pre-processed in Phyton (synchronization, blink control, downsampling) and analyzed using correlation analysis and Granger causality tests. The results show negative and statistically significant correlations between the data streams for emotional arousal and pupil diameter. However, the correlation is negative and significant only for epochs of high arousal, while positive but non-significant relationships were found for epochs of medium or low arousal. The average time lag for the relationship between arousal and pupil diameter was 2.8 ms. In contrast to previous findings without a multimodal approach suggesting a positive correlation between the constructs, the results contribute to the state of research by highlighting the importance of multimodal data validation and research on convergent vagility. Future research should consider emotional regulation strategies and emotional valence.
多模态数据分析和验证，基于最先进的传感器技术（如眼动跟踪或情感识别）和面部动作编码系统（FACTs）的深度学习，使教育研究者能够研究多方面的学习和解决问题过程，并改善教育体验。本研究旨在调查高、中、低情感唤醒（RQ 1a）连续传感器流之间是否存在相关性，以及深度学习编码的 emotional arousal（RQ 1b）与 pupil diameter（RQ 1a）之间的相关性（RQ 2）。 28名参与者针对三个具有挑战性和情感参与性的日常道德困境进行研究，同时收集了眼动跟踪和情感识别数据。数据经过预处理（同步、眨眼控制、降采样），并使用相关分析和支持性因果关系检验进行分析。 结果显示情感唤醒与瞳孔直径之间存在负相关。然而，只有高情感唤醒时存在显著的相关性，而中或低情感唤醒时则未发现显著的相关性。情感唤醒与瞳孔直径之间平均时间延迟为2.8毫秒。 与之前没有多模态方法的建议存在正相关关系的研究不同，本研究的结果对研究现状做出了重要贡献，突出了多模态数据验证和研究在 convergent vagility 中的重要性。未来的研究应考虑情感调节策略和情感价值。
The multi-modality and stochastic characteristics of human behavior make motion prediction a highly challenging task, which is critical for autonomous driving. While deep learning approaches have demonstrated their great potential in this area, it still remains unsolved to establish a connection between multiple driving scenes (e.g., merging, roundabout, intersection) and the design of deep learning models. Current learning-based methods typically use one unified model to predict trajectories in different scenarios, which may result in sub-optimal results for one individual scene. To address this issue, we propose Multi-Scenes Network (aka. MS-Net), which is a multi-path sparse model trained by an evolutionary process. MS-Net selectively activates a subset of its parameters during the inference stage to produce prediction results for each scene. In the training stage, the motion prediction task under differentiated scenes is abstracted as a multi-task learning problem, an evolutionary algorithm is designed to encourage the network search of the optimal parameters for each scene while sharing common knowledge between different scenes. Our experiment results show that with substantially reduced parameters, MS-Net outperforms existing state-of-the-art methods on well-established pedestrian motion prediction datasets, e.g., ETH and UCY, and ranks the 2nd place on the INTERACTION challenge.
人类行为的多样性和随机性使得运动预测成为一个非常具有挑战性的任务，这对自动驾驶至关重要。尽管深度学习方法在这个领域已经展现出了巨大的潜力，但仍然没有解决将多个驾驶场景（例如汇入、环行、交叉口）与深度学习模型设计之间的联系。基于当前的学习方法通常使用一个统一的模型预测不同场景中的轨迹，这可能导致单个场景的预测结果不是最优的。为了解决这个问题，我们提出了 Multi-Scenes Network（MS-Net），它是一种通过进化过程训练的多路径稀疏模型。在推理阶段，MS-Net 选择性地激活其参数以产生每个场景的预测结果。在训练阶段，将不同场景下的运动预测任务抽象成一个多任务学习问题，设计一种进化算法来鼓励网络搜索每个场景的最优参数，并共享不同场景之间的共同知识。我们的实验结果表明，在大幅减少参数的情况下，MS-Net 在成熟的人行道运动预测数据集（例如 ETH 和 UCY）上优于现有最先进的方法，并在 INTERACTION 挑战中排名第二。
Recent studies in neuro-symbolic learning have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.
Nowadays, with advanced information technologies deployed citywide, large data volumes and powerful computational resources are intelligentizing modern city development. As an important part of intelligent transportation, route recommendation and its applications are widely used, directly influencing citizens` travel habits. Developing smart and efficient travel routes based on big data (possibly multi-modal) has become a central challenge in route recommendation research. Our survey offers a comprehensive review of route recommendation work based on urban computing. It is organized by the following three parts: 1) Methodology-wise. We categorize a large volume of traditional machine learning and modern deep learning methods. Also, we discuss their historical relations and reveal the edge-cutting progress. 2) Application\-wise. We present numerous novel applications related to route commendation within urban computing scenarios. 3) We discuss current problems and challenges and envision several promising research directions. We believe that this survey can help relevant researchers quickly familiarize themselves with the current state of route recommendation research and then direct them to future research trends.
如今，随着先进的信息化技术的在城市范围内部署，大量数据和强大的计算能力使现代城市发展变得更加智能。作为智能交通的重要组成部分，路线推荐及其应用在很大程度上影响了公民的出行习惯。基于大数据（可能是多模态）开发智能和高效的出行路线已成为路线推荐研究的一个中心挑战。我们的调查对基于城市计算的路线推荐工作进行了全面的回顾。该调查分为以下三个部分： 1）方法论-wise。我们分类了大量传统机器学习方法和现代深度学习方法。我们还讨论了它们的历史关系，并揭示了边缘切割的进展。 2）应用-wise。我们在城市计算场景中呈现了大量的与路线推荐相关的全新应用。 3）我们讨论了当前的问题和挑战，并展望了几个有前景的研究方向。我们相信，这次调查可以帮助相关研究人员快速熟悉路线推荐研究的现状，然后将他们引导到未来的研究趋势。
In contemporary rural healthcare settings, the principal challenge in diagnosing brain images is the scarcity of available data, given that most of the existing deep learning models demand extensive training data to optimize their performance, necessitating centralized processing methods that potentially compromise data privacy. This paper proposes a novel framework tailored for brain tissue segmentation in rural healthcare facilities. The framework employs a deep reinforcement learning (DRL) environment in tandem with a refinement model (RM) deployed locally at rural healthcare sites. The proposed DRL model has a reduced parameter count and practicality for implementation across distributed rural sites. To uphold data privacy and enhance model generalization without transgressing privacy constraints, we employ federated learning (FL) for cooperative model training. We demonstrate the efficacy of our approach by training the network with a limited data set and observing a substantial performance enhancement, mitigating inaccuracies and irregularities in segmentation across diverse sites. Remarkably, the DRL model attains an accuracy of up to 80%, surpassing the capabilities of conventional convolutional neural networks when confronted with data insufficiency. Incorporating our RM results in an additional accuracy improvement of at least 10%, while FL contributes to a further accuracy enhancement of up to 5%. Collectively, the framework achieves an average 92% accuracy rate within rural healthcare settings characterized by data constraints.
Foundation models, especially LLMs, are profoundly transforming deep learning. Instead of training many task-specific models, we can adapt a single pretrained model to many tasks via fewshot prompting or fine-tuning. However, current foundation models apply to sequence data but not to time series, which present unique challenges due to the inherent diverse and multidomain time series datasets, diverging task specifications across forecasting, classification and other types of tasks, and the apparent need for task-specialized models. We developed UNITS, a unified time series model that supports a universal task specification, accommodating classification, forecasting, imputation, and anomaly detection tasks. This is achieved through a novel unified network backbone, which incorporates sequence and variable attention along with a dynamic linear operator and is trained as a unified model. Across 38 multi-domain datasets, UNITS demonstrates superior performance compared to task-specific models and repurposed natural language-based LLMs. UNITS exhibits remarkable zero-shot, few-shot, and prompt learning capabilities when evaluated on new data domains and tasks. The source code and datasets are available at this https URL.
Deep learning architectures have made significant progress in terms of performance in many research areas. The automatic speech recognition (ASR) field has thus benefited from these scientific and technological advances, particularly for acoustic modeling, now integrating deep neural network architectures. However, these performance gains have translated into increased complexity regarding the information learned and conveyed through these black-box architectures. Following many researches in neural networks interpretability, we propose in this article a protocol that aims to determine which and where information is located in an ASR acoustic model (AM). To do so, we propose to evaluate AM performance on a determined set of tasks using intermediate representations (here, at different layer levels). Regarding the performance variation and targeted tasks, we can emit hypothesis about which information is enhanced or perturbed at different architecture steps. Experiments are performed on both speaker verification, acoustic environment classification, gender classification, tempo-distortion detection systems and speech sentiment/emotion identification. Analysis showed that neural-based AMs hold heterogeneous information that seems surprisingly uncorrelated with phoneme recognition, such as emotion, sentiment or speaker identity. The low-level hidden layers globally appears useful for the structuring of information while the upper ones would tend to delete useless information for phoneme recognition.
The growing number of cases requiring digital forensic analysis raises concerns about law enforcement's ability to conduct investigations promptly. Consequently, this systemisation of knowledge paper delves into the potential and effectiveness of integrating Large Language Models (LLMs) into digital forensic investigation to address these challenges. A thorough literature review is undertaken, encompassing existing digital forensic models, tools, LLMs, deep learning techniques, and the utilisation of LLMs in investigations. The review identifies current challenges within existing digital forensic processes and explores both the obstacles and possibilities of incorporating LLMs. In conclusion, the study asserts that the adoption of LLMs in digital forensics, with appropriate constraints, holds the potential to enhance investigation efficiency, improve traceability, and alleviate technical and judicial barriers faced by law enforcement entities.
As cities continue to burgeon, Urban Computing emerges as a pivotal discipline for sustainable development by harnessing the power of cross-domain data fusion from diverse sources (e.g., geographical, traffic, social media, and environmental data) and modalities (e.g., spatio-temporal, visual, and textual modalities). Recently, we are witnessing a rising trend that utilizes various deep-learning methods to facilitate cross-domain data fusion in smart cities. To this end, we propose the first survey that systematically reviews the latest advancements in deep learning-based data fusion methods tailored for urban computing. Specifically, we first delve into data perspective to comprehend the role of each modality and data source. Secondly, we classify the methodology into four primary categories: feature-based, alignment-based, contrast-based, and generation-based fusion methods. Thirdly, we further categorize multi-modal urban applications into seven types: urban planning, transportation, economy, public safety, society, environment, and energy. Compared with previous surveys, we focus more on the synergy of deep learning methods with urban computing applications. Furthermore, we shed light on the interplay between Large Language Models (LLMs) and urban computing, postulating future research directions that could revolutionize the field. We firmly believe that the taxonomy, progress, and prospects delineated in our survey stand poised to significantly enrich the research community. The summary of the comprehensive and up-to-date paper list can be found at this https URL.
Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. Contemporary deep learning-based models primarily analyze the video content in its aggressively downsampled format, while being blind to the impact of actual spatial resolution and frame rate on video quality. In this paper, we propose a modular BVQA model, and a method of training it to improve its modularity. Specifically, our model comprises a base quality predictor, a spatial rectifier, and a temporal rectifier, responding to the visual content and distortion, spatial resolution, and frame rate changes on video quality, respectively. During training, spatial and temporal rectifiers are dropped out with some probabilities so as to make the base quality predictor a standalone BVQA model, which should work better with the rectifiers. Extensive experiments on both professionally-generated content and user generated content video databases show that our quality model achieves superior or comparable performance to current methods. Furthermore, the modularity of our model offers a great opportunity to analyze existing video quality databases in terms of their spatial and temporal complexities. Last, our BVQA model is cost-effective to add other quality-relevant video attributes such as dynamic range and color gamut as additional rectifiers.
The development and progression of arthritis is strongly associated with osteophytes, which are small and elusive bone growths. This paper presents one of the first efforts towards automated spinal osteophyte detection in spinal X-rays. A novel automated patch extraction process, called SegPatch, has been proposed based on deep learning-driven vertebrae segmentation and the enlargement of mask contours. A final patch classification accuracy of 84.5\% is secured, surpassing a baseline tiling-based patch generation technique by 9.5%. This demonstrates that even with limited annotations, SegPatch can deliver superior performance for detection of tiny structures such as osteophytes. The proposed approach has potential to assist clinicians in expediting the process of manually identifying osteophytes in spinal X-ray.
Retinal fundus images play a crucial role in the early detection of eye diseases and, using deep learning approaches, recent studies have even demonstrated their potential for detecting cardiovascular risk factors and neurological disorders. However, the impact of technical factors on these images can pose challenges for reliable AI applications in ophthalmology. For example, large fundus cohorts are often confounded by factors like camera type, image quality or illumination level, bearing the risk of learning shortcuts rather than the causal relationships behind the image generation process. Here, we introduce a novel population model for retinal fundus images that effectively disentangles patient attributes from camera effects, thus enabling controllable and highly realistic image generation. To achieve this, we propose a novel disentanglement loss based on distance correlation. Through qualitative and quantitative analyses, we demonstrate the effectiveness of this novel loss function in disentangling the learned subspaces. Our results show that our model provides a new perspective on the complex relationship between patient attributes and technical confounders in retinal fundus image generation.
视网膜 fundus 图像在早期检测眼病方面扮演着关键角色，并且，使用深度学习方法，最近的研究甚至证明了它们在检测心血管风险因素和神经系统疾病方面的潜在能力。然而，技术因素对 these images 的影响可能会对眼科 AI 应用程序的可靠性造成挑战。例如，大型视网膜数据集常常受到相机类型、图像质量或照明水平等因素的困扰，从而可能导致学习 shortcuts 而不是图像生成过程背后的因果关系。在这里，我们引入了一个新的视网膜 fundus 图像人群模型，有效地将患者属性与相机效应分离，从而实现了可控制和高度逼真的图像生成。为了实现这一目标，我们提出了一个基于距离协定的新去中心化损失函数。通过定性和定量分析，我们证明了这种新损失函数在去中心化学习子空间方面的有效性。我们的结果表明，我们的模型为视网膜 fundus 图像生成中患者属性和技术干扰之间的复杂关系提供了一个新的视角。
Neural Architecture Search (NAS) paves the way for the automatic definition of Neural Network (NN) architectures, attracting increasing research attention and offering solutions in various scenarios. This study introduces a novel NAS solution, called Flat Neural Architecture Search (FlatNAS), which explores the interplay between a novel figure of merit based on robustness to weight perturbations and single NN optimization with Sharpness-Aware Minimization (SAM). FlatNAS is the first work in the literature to systematically explore flat regions in the loss landscape of NNs in a NAS procedure, while jointly optimizing their performance on in-distribution data, their out-of-distribution (OOD) robustness, and constraining the number of parameters in their architecture. Differently from current studies primarily concentrating on OOD algorithms, FlatNAS successfully evaluates the impact of NN architectures on OOD robustness, a crucial aspect in real-world applications of machine and deep learning. FlatNAS achieves a good trade-off between performance, OOD generalization, and the number of parameters, by using only in-distribution data in the NAS exploration. The OOD robustness of the NAS-designed models is evaluated by focusing on robustness to input data corruptions, using popular benchmark datasets in the literature.
Neural Architecture Search (NAS) 为神经网络 (NN) 架构的自动定义铺平了道路，吸引了越来越多的研究关注，并为各种场景提供了解决方案。本研究介绍了一种新颖的 NAS 解决方案，称为平滑神经架构搜索 (FlatNAS)，探讨了基于新颖的基于容错性的指标和基于Sharpness-Aware最小化 (SAM) 的单 NN 优化之间的相互作用。FlatNAS 是文献中第一个系统地探索 NN 损失函数平面上平局的解决方案，同时也在其分布数据上对其性能和离散数据 (OOD) 鲁棒性进行优化，并限制其架构中参数的数量。与当前研究主要集中于 OOD 算法的研究不同，FlatNAS 成功地评估了 NN 架构对 OOD 鲁棒性的影响，这是机器和深度学习在现实世界应用中至关重要的一个方面。通过仅使用分布数据进行 NAS 探索，FlatNAS 实现了性能、OO 泛化性和参数数量之间的良好平衡。 FlatNAS 对 NAS 设计的模型的 OOD 鲁棒性进行了评估，通过关注输入数据污染的鲁棒性，使用了文献中流行的基准数据集。
To facilitate diagnosis on cardiac ultrasound (US), clinical practice has established several standard views of the heart, which serve as reference points for diagnostic measurements and define viewports from which images are acquired. Automatic view recognition involves grouping those images into classes of standard views. Although deep learning techniques have been successful in achieving this, they still struggle with fully verifying the suitability of an image for specific measurements due to factors like the correct location, pose, and potential occlusions of cardiac structures. Our approach goes beyond view classification and incorporates a 3D mesh reconstruction of the heart that enables several more downstream tasks, like segmentation and pose estimation. In this work, we explore learning 3D heart meshes via graph convolutions, using similar techniques to learn 3D meshes in natural images, such as human pose estimation. As the availability of fully annotated 3D images is limited, we generate synthetic US images from 3D meshes by training an adversarial denoising diffusion model. Experiments were conducted on synthetic and clinical cases for view recognition and structure detection. The approach yielded good performance on synthetic images and, despite being exclusively trained on synthetic data, it already showed potential when applied to clinical images. With this proof-of-concept, we aim to demonstrate the benefits of graphs to improve cardiac view recognition that can ultimately lead to better efficiency in cardiac diagnosis.
为了在心脏超声（US）检查中进行诊断，临床实践已经建立了几组心脏标准视图，这些视图作为诊断测量参考点并定义了获取图像的视角。自动视图识别涉及将图像分组到标准视图类别中。尽管深度学习技术已经在实现这一目标上取得了成功，但由于因素如正确位置、姿态和心脏结构的遮挡，它们仍难以完全验证图像是否适用于特定测量。我们的方法超越了视图分类，并引入了心脏3D网格重建，使 several 后续任务成为可能，如分割和姿态估计。在这项工作中，我们通过图卷积学习学习3D心网格，使用类似的人体姿态估计技术来学习自然图像中的3D网格。由于完全标注的3D图像的可用性有限，我们通过训练对抗去噪扩散模型从3D网格生成合成US图像。我们对视图识别和结构检测进行实验。该方法在合成图像上表现出良好的性能，尽管它仅在合成数据上进行训练，但已经表现出将应用于临床图像的潜在能力。通过这一概念证明，我们旨在展示图对提高心脏视图识别效果所带来的好处，从而最终改善心脏诊断的效率。
Atmospheric turbulence poses a challenge for the interpretation and visual perception of visual imagery due to its distortion effects. Model-based approaches have been used to address this, but such methods often suffer from artefacts associated with moving content. Conversely, deep learning based methods are dependent on large and diverse datasets that may not effectively represent any specific content. In this paper, we address these problems with a self-supervised learning method that does not require ground truth. The proposed method is not dependent on any dataset outside of the single data sequence being processed but is also able to improve the quality of any input raw sequences or pre-processed sequences. Specifically, our method is based on an accelerated Deep Image Prior (DIP), but integrates temporal information using pixel shuffling and a temporal sliding window. This efficiently learns spatio-temporal priors leading to a system that effectively mitigates atmospheric turbulence distortions. The experiments show that our method improves visual quality results qualitatively and quantitatively.
大气扰动对图像的解释和视觉感知构成了挑战,因为它的扭曲效应会导致图像质量下降。为了解决这个问题,已经使用了基于模型的方法,但这种方法通常会导致与运动内容相关的伪影。相反,基于深度学习的方法依赖于大型和多样化的数据集,可能无法有效表示任何特定内容。在本文中,我们通过一种不需要地面真实值的自监督学习方法来解决这些问题。所提出的方法不仅不依赖于处理过程中的任何数据序列,而且还可以提高任何输入原始序列或预处理序列的质量。具体来说,我们的方法基于加速的Deep Image Prior(DIP),并通过像素洗牌和时间滑动窗口整合时间信息。这种方法有效地学习空间-时间先验,导致一个可以有效缓解大气扰动扭曲效应的系统。实验结果表明,我们的方法在质量和数量上都有所提高。