Enhancing the robustness of deep learning models, particularly in the realm of vision transformers (ViTs), is crucial for their real-world deployment. In this work, we provide a finetuning approach to enhance the robustness of vision transformers inspired by the concept of nullspace from linear algebra. Our investigation centers on whether a vision transformer can exhibit resilience to input variations akin to the nullspace property in linear mappings, implying that perturbations sampled from this nullspace do not influence the model's output when added to the input. Firstly, we show that for many pretrained ViTs, a non-trivial nullspace exists due to the presence of the patch embedding layer. Secondly, as nullspace is a concept associated with linear algebra, we demonstrate that it is possible to synthesize approximate nullspace elements for the non-linear blocks of ViTs employing an optimisation strategy. Finally, we propose a fine-tuning strategy for ViTs wherein we augment the training data with synthesized approximate nullspace noise. After finetuning, we find that the model demonstrates robustness to adversarial and natural image perbutations alike.
增强深度学习模型的稳健性,特别是视觉Transformer(ViTs)领域,对其实际部署至关重要。在这项工作中,我们提出了一种灵感来自线性代数中的零空间概念的微调方法,以增强视觉Transformer的稳健性。我们的研究集中在是否一个视觉Transformer可以表现出类似于零空间属性的输入变化抗性,这意味着从零空间中采样到的扰动在添加到输入时不会影响模型的输出。首先,我们证明了对于许多预训练的ViT,由于存在补丁嵌入层,存在非平凡零空间。其次,由于零空间与线性代数有关,我们证明了可以使用优化策略合成ViT的的非线性块的近似零空间元素。最后,我们提出了一种ViT的微调策略,即通过合成近似零空间噪声来增加训练数据。经过微调后,我们发现模型对各种类型的图像扰动都表现出抗性。
https://arxiv.org/abs/2403.10476
Combining empirical risk minimization with capacity control is a classical strategy in machine learning when trying to control the generalization gap and avoid overfitting, as the model class capacity gets larger. Yet, in modern deep learning practice, very large over-parameterized models (e.g. neural networks) are optimized to fit perfectly the training data and still obtain great generalization performance. Past the interpolation point, increasing model complexity seems to actually lower the test error. In this tutorial, we explain the concept of double descent and its mechanisms. The first section sets the classical statistical learning framework and introduces the double descent phenomenon. By looking at a number of examples, section 2 introduces inductive biases that appear to have a key role in double descent by selecting, among the multiple interpolating solutions, a smooth empirical risk minimizer. Finally, section 3 explores the double descent with two linear models, and gives other points of view from recent related works.
将实证风险最小化与容量控制相结合是机器学习在尝试控制泛化差距和避免过拟合时的一种经典策略。随着模型类别的容量越大,这种策略就越有意义。然而,在现代深度学习实践中,优化非常大过参数化模型(例如神经网络)以完美适应训练数据,同时仍然获得很好的泛化性能。在平滑点之后,增加模型复杂度似乎实际上降低了测试误差。在本教程中,我们解释了双下降的概念及其机制。第一部分设置了经典统计学习框架,并介绍了双下降现象。通过观察一些示例,第二部分引入了归纳偏见,这些偏见似乎在双下降中扮演了关键角色,通过在多个平滑解中选择一个平滑的实证风险最小化器。最后,第三部分探讨了使用两个线性模型的双下降,并给出了最近相关工作的其他观点。
https://arxiv.org/abs/2403.10459
Current rock engineering design in drill and blast tunnelling primarily relies on engineers' observational assessments. Measure While Drilling (MWD) data, a high-resolution sensor dataset collected during tunnel excavation, is underutilised, mainly serving for geological visualisation. This study aims to automate the translation of MWD data into actionable metrics for rock engineering. It seeks to link data to specific engineering actions, thus providing critical decision support for geological challenges ahead of the tunnel face. Leveraging a large and geologically diverse dataset of 500,000 drillholes from 15 tunnels, the research introduces models for accurate rock mass quality classification in a real-world tunnelling context. Both conventional machine learning and image-based deep learning are explored to classify MWD data into Q-classes and Q-values, examples of metrics describing the stability of the rock mass, using both tabular and image data. The results indicate that the K-nearest neighbours algorithm in an ensemble with tree-based models using tabular data, effectively classifies rock mass quality. It achieves a cross-validated balanced accuracy of 0.86 in classifying rock mass into the Q-classes A, B, C, D, E1, E2, and 0.95 for a binary classification with E versus the rest. Classification using a CNN with MWD-images for each blasting round resulted in a balanced accuracy of 0.82 for binary classification. Regressing the Q-value from tabular MWD-data achieved cross-validated R2 and MSE scores of 0.80 and 0.18 for a similar ensemble model as in classification. High performance in regression and classification boosts confidence in automated rock mass assessment. Applying advanced modelling on a unique dataset demonstrates MWD data's value in improving rock mass classification accuracy and advancing data-driven rock engineering design, reducing manual intervention.
目前,钻掘和爆破隧道主要依赖工程师的观察评估。测量掘进过程中的声波数据(MWD)是一个在隧道挖掘过程中收集的高分辨率传感器数据集,主要用於地质可视化。本研究旨在将MWD数据自动化为岩石工程的有用指标。它试图将数据与具体的工程行动联系起来,为即将到来的隧道面前的地质挑战提供关键决策支持。 利用来自15个隧道的500,000个钻孔的大型且地质多样性的数据集,研究引入了在现实钻掘背景下准确判断岩石质量模型的方法。研究探索了使用传统机器学习和基于图像的深度学习将MWD数据分类为Q类和Q值的模型。使用表格和图像数据描述岩石质量的指标。 结果显示,在具有树状模型的集合中,K-最近邻算法有效地分类了岩石质量。它将岩石质量分类到Q类,Q值的二分类器的交叉验证平衡精度提高到0.86。使用CNN对每个爆破周期处理的MWD图像进行分类,二分类器的平衡精度为0.82。从表格MWD数据中回归Q值,其交叉验证的R2和MSE分数分别为0.80和0.18,与分类模型的结果相似。 在先进的建模技术的帮助下,对独特数据集的深入研究证明了MWD数据在提高岩石质量分类准确性和推动数据驱动岩石工程设计方面的价值,并减少了手动干预。
https://arxiv.org/abs/2403.10404
Deep learning (DL) models have emerged as a powerful tool in avian bioacoustics to diagnose environmental health and biodiversity. However, inconsistencies in research pose notable challenges hindering progress in this domain. Reliable DL models need to analyze bird calls flexibly across various species and environments to fully harness the potential of bioacoustics in a cost-effective passive acoustic monitoring scenario. Data fragmentation and opacity across studies complicate a comprehensive evaluation of general model performance. To overcome these challenges, we present the BirdSet benchmark, a unified framework consolidating research efforts with a holistic approach for classifying bird vocalizations in avian bioacoustics. BirdSet harmonizes open-source bird recordings into a curated dataset collection. This unified approach provides an in-depth understanding of model performance and identifies potential shortcomings across different tasks. By establishing baseline results of current models, BirdSet aims to facilitate comparability, guide subsequent data collection, and increase accessibility for newcomers to avian bioacoustics.
深度学习(DL)模型在鸟类生物声学领域被证明是一个强大的工具,用于诊断环境和生物多样性。然而,研究的不一致性给这个领域的发展带来了显著的挑战,从而阻碍了进步。可靠的数据库模型需要根据不同物种和环境对鸟叫进行分析,以全面利用生物声学的潜力,实现成本有效的被动式声学监测场景。数据碎片化和研究结果的透明度限制了对模型性能的全面评估。为了克服这些挑战,我们提出了BirdSet基准,这是一个整合研究努力并采用全面方法的鸟类生物声学分类数据集。BirdSet将开源鸟类录音整合成一个 curated的数据集系列。这种统一的方法深入研究了模型表现,并指出了不同任务中的潜在不足。通过建立当前模型的基线结果,BirdSet旨在促进可比较性,指导后续数据收集,并为鸟类生物声学的新手提供易于使用的便利性。
https://arxiv.org/abs/2403.10380
Advances in Deep Learning have made possible reliable landmark tracking of human bodies and faces that can be used for a variety of tasks. We test a recent Computer Vision solution, MediaPipe Holistic (MPH), to find out if its tracking of the facial features is reliable enough for a linguistic analysis of data from sign languages, and compare it to an older solution (OpenFace, OF). We use an existing data set of sentences in Kazakh-Russian Sign Language and a newly created small data set of videos with head tilts and eyebrow movements. We find that MPH does not perform well enough for linguistic analysis of eyebrow movement -- but in a different way from OF, which is also performing poorly without correction. We reiterate a previous proposal to train additional correction models to overcome these limitations.
深度学习的进步使得可靠的人体和面部标记跟踪成为可能,这些跟踪可以用于各种任务。我们测试了MediaPipe Holistic(MPH)作为研究视觉解决方案的最新进展,以确定其对签语数据中面部特征的跟踪是否足够可靠,并与较旧的解决方案(OpenFace,OF)进行比较。我们使用了保加利亚-俄语手语现有数据集和一个新的小型数据集,其中包括头部倾斜和眼眉运动的视频。我们发现,MPH在签名数据的语义分析方面表现不佳,但与OF不同的是,即使在未校正的情况下,OF的表现也较差。我们重申了之前提出的建议,即训练额外的修正模型以克服这些限制。
https://arxiv.org/abs/2403.10367
Accelerating dynamic MRI is essential for enhancing clinical applications, such as adaptive radiotherapy, and improving patient comfort. Traditional deep learning (DL) approaches for accelerated dynamic MRI reconstruction typically rely on predefined or random subsampling patterns, applied uniformly across all temporal phases. This standard practice overlooks the potential benefits of leveraging temporal correlations and lacks the adaptability required for case-specific subsampling optimization, which holds the potential for maximizing reconstruction quality. Addressing this gap, we present a novel end-to-end framework for adaptive dynamic MRI subsampling and reconstruction. Our pipeline integrates a DL-based adaptive sampler, generating case-specific dynamic subsampling patterns, trained end-to-end with a state-of-the-art 2D dynamic reconstruction network, namely vSHARP, which effectively reconstructs the adaptive dynamic subsampled data into a moving image. Our method is assessed using dynamic cine cardiac MRI data, comparing its performance against vSHARP models that employ common subsampling trajectories, and pipelines trained to optimize dataset-specific sampling schemes alongside vSHARP reconstruction. Our results indicate superior reconstruction quality, particularly at high accelerations.
加速动态MRI对于增强临床应用,如自适应放疗,以及提高患者舒适度具有关键作用。传统的深度学习(DL)方法加速动态MRI重构通常依赖于预定义的或随机的子采样模式,在所有时间阶段上均匀应用。这种标准化做法忽视了利用时间关联的潜力,并缺乏针对具体子采样优化的适应性,这有可能提高重建质量。 为解决这一空白,我们提出了一个用于自适应动态MRI子采样和重构的端到端框架。我们的管道包括基于DL的适应采样器,生成特定病例的动态子采样模式,与最先进的2D动态重建网络——vSHARP——端到端训练,有效将适应动态子采样数据还原为动态图像。我们对该方法使用动态电影心脏MRI数据进行了评估,将其性能与采用常见子采样轨迹的vSHARP模型的性能进行了比较,以及与vSHARP重构的训练数据特定的采样方案相关的 pipeline。我们的结果表明,在高速运动下,重建质量具有优势,尤其是在高加速运动下。
https://arxiv.org/abs/2403.10346
Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is then determined via PnP+RANSAC, using established 2D-3D correspondences. While KSCR achieves competitive results, rivaling state-of-the-art image-retrieval methods like HLoc across multiple benchmarks, its performance is hindered when data samples are limited due to the deep learning model's reliance on extensive data. This paper proposes a solution to this challenge by introducing a pipeline for keypoint descriptor synthesis using Neural Radiance Field (NeRF). By generating novel poses and feeding them into a trained NeRF model to create new views, our approach enhances the KSCR's generalization capabilities in data-scarce environments. The proposed system could significantly improve localization accuracy by up to 50\% and cost only a fraction of time for data synthesis. Furthermore, its modular design allows for the integration of multiple NeRFs, offering a versatile and efficient solution for visual localization. The implementation is publicly available at: this https URL.
基于经典结构视觉定位方法具有高精度,但存储、速度和隐私方面存在牺牲。一种最近的创新,关键点场景坐标回归(KSCR)命名为D2S,通过利用图注意力网络增强关键点关系并使用简单的多层感知器(MLP)预测其3D坐标,从而解决这些问题。然后通过PnP+RANSAC确定相机姿态,利用已建立的2D-3D对应关系。虽然KSCR在多个基准测试中实现了竞争力的结果,与像HLoc这样的最先进图像检索方法相媲美,但当数据样本有限时,由于深度学习模型对广泛数据的高度依赖,其性能受到限制。为了解决这个问题,本文提出了一种通过使用神经辐射场(NeRF)进行关键点描述符的流程来解决这个挑战。通过生成新的姿态并将其输入已训练的NeRF模型以生成新的视图,我们的方法在数据稀疏环境中提高了KSCR的泛化能力。所提出的系统可以通过提高50\%的定位精度,仅用很少的时间来合成数据,显著提高局部定位精度。此外,它的模块化设计允许将多个NeRF集成到一个 versatile和高效的视觉定位解决方案中。实现可通过以下链接公开获得:https://this URL。
https://arxiv.org/abs/2403.10297
How well the heart is functioning can be quantified through measurements of myocardial deformation via echocardiography. Clinical assessment of cardiac function is generally focused on global indices of relative shortening, however, territorial, and segmental strain indices have shown to be abnormal in regions of myocardial disease, such as scar. In this work, we propose a single framework to predict myocardial disease substrates at global, territorial, and segmental levels using regional myocardial strain traces as input to a convolutional neural network (CNN)-based classification algorithm. An anatomically meaningful representation of the input data from the clinically standard bullseye representation to a multi-channel 2D image is proposed, to formulate the task as an image classification problem, thus enabling the use of state-of-the-art neural network configurations. A Fully Convolutional Network (FCN) is trained to detect and localize myocardial scar from regional left ventricular (LV) strain patterns. Simulated regional strain data from a controlled dataset of virtual patients with varying degrees and locations of myocardial scar is used for training and validation. The proposed method successfully detects and localizes the scars on 98% of the 5490 left ventricle (LV) segments of the 305 patients in the test set using strain traces only. Due to the sparse existence of scar, only 10% of the LV segments in the virtual patient cohort have scar. Taking the imbalance into account, the class balanced accuracy is calculated as 95%. The performance is reported on global, territorial, and segmental levels. The proposed method proves successful on the strain traces of the virtual cohort and offers the potential to solve the regional myocardial scar detection problem on the strain traces of the real patient cohorts.
心脏功能的好坏可以通过通过超声心动图测量的心肌变形来量化。通常,临床评估心脏功能关注的是相对缩短的指标,然而,领域和片段应变指标在心肌病区域异常,例如瘢痕。在这项工作中,我们提出了一个用于预测基于卷积神经网络(CNN)的全面心肌病亚临床水平的方法,该方法使用区域心肌应变迹作为输入来训练卷积神经网络分类算法。我们提出了一个解剖学上有意义的心肌输入数据的从一个标准的 bullseye表示到多通道 2D 图像的转换,将任务表述为图像分类问题,从而使最先进的神经网络配置得以应用。 完全卷积网络(FCN)从区域左心室(LV)应变模式中训练来检测和定位心肌瘢痕。用于训练和验证的模拟区域应变数据来自于具有不同程度和位置心肌瘢痕的虚拟患者数据集。只使用应变迹成功检测和定位了测试集中305名患者中的98%的左心室(LV)段的瘢痕。由于瘢痕存在稀疏性,虚拟患者队列中只有10%的LV段有瘢痕。考虑到不平衡,计算得到分类平衡准确度为95%。在全局、领土和片段水平上报告性能。我们的方法在虚拟队列的应变迹上表现成功,并有望解决实际患者队列中区域心肌瘢痕检测问题。
https://arxiv.org/abs/2403.10291
Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework's ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.
腹腔镜手术中手术器械分割对于计算机辅助手术系统至关重要。尽管近年来深度学习取得了进步,但腹腔镜手术的动态设置仍然存在对于精确分割的挑战。nnU-Net框架在语义分割中分析单帧数据时表现出色,而无需考虑时间信息。该框架的易用性(包括自动配置的能力)以及低专业要求,使其成为比较激烈的基础框架。 光流(OF)是一种常用的视频任务工具,用于估计运动并将其表示为一帧,包含时间信息。这项工作旨在将OF地图作为nnU-Net架构的额外输入,以提高其在手术器械分割任务中的性能,并利用手术领域中器械是主要运动对象的事实。通过这种新输入,可以间接地添加时间组件而无需修改架构。使用CholecSeg8k数据集,估计了三种不同的运动表示,并将其用作新的输入,与基线模型进行比较。结果显示,使用OF地图可以提高对于数据集中运动类别的检测,即使这些类别在数据集中较为稀缺。为了进一步提高性能,未来的工作可以关注实现其他OF保留的增强。
https://arxiv.org/abs/2403.10216
In goal-oriented communications, the objective of the receiver is often to apply a Deep-Learning model, rather than reconstructing the original data. In this context, direct learning over compressed data, without any prior decoding, holds promise for enhancing the time-efficient execution of inference models at the receiver. However, conventional entropic-coding methods like Huffman and Arithmetic break data structure, rendering them unsuitable for learning without decoding. In this paper, we propose an alternative approach in which entropic coding is realized with Low-Density Parity Check (LDPC) codes. We hypothesize that Deep Learning models can more effectively exploit the internal code structure of LDPC codes. At the receiver, we leverage a specific class of Recurrent Neural Networks (RNNs), specifically Gated Recurrent Unit (GRU), trained for image classification. Our numerical results indicate that classification based on LDPC-coded bit-planes surpasses Huffman and Arithmetic coding, while necessitating a significantly smaller learning model. This demonstrates the efficiency of classification directly from LDPC-coded data, eliminating the need for any form of decompression, even partial, prior to applying the learning model.
在目标导向的通信中,接收者的目标通常是应用深度学习模型,而不是重构原始数据。在这种情况下,直接在压缩数据上进行熵编码,无需前解码,对于增强接收器中推理模型的时间效率具有前景。然而,像Huffman和Arithmetic这样的传统熵编码方法会破坏数据结构,使得它们不适合无需解码的学习。在本文中,我们提出了一个替代方法,其中熵编码通过低密度奇偶校验(LDPC)码实现。我们假设深度学习模型可以更有效地利用LDPC码的内部编码结构。在接收端,我们利用一种特定的循环神经网络(RNN)类,特别是门控循环单元(GRU),为图像分类进行训练。我们的数值结果表明,基于LDPC编码的位平面分类超过了Huffman和Arithmetic编码,而需要的学习模型远小得多。这证明了从LDPC编码数据中直接进行分类的高效性,无需进行任何形式的解码,即使是部分解码。
https://arxiv.org/abs/2403.10202
Deep learning (DL) models have been advancing automatic medical image analysis on various modalities, including echocardiography, by offering a comprehensive end-to-end training pipeline. This approach enables DL models to regress ejection fraction (EF) directly from 2D+time echocardiograms, resulting in superior performance. However, the end-to-end training pipeline makes the learned representations less explainable. The representations may also fail to capture the continuous relation among echocardiogram clips, indicating the existence of spurious correlations, which can negatively affect the generalization. To mitigate this issue, we propose CoReEcho, a novel training framework emphasizing continuous representations tailored for direct EF regression. Our extensive experiments demonstrate that CoReEcho: 1) outperforms the current state-of-the-art (SOTA) on the largest echocardiography dataset (EchoNet-Dynamic) with MAE of 3.90 & R2 of 82.44, and 2) provides robust and generalizable features that transfer more effectively in related downstream tasks. The code is publicly available at this https URL.
深度学习(DL)模型已经在各种模式上 advances automatic medical image analysis,包括超声心动图,通过提供端到端的训练管道。这种方法使得DL模型可以从2D+时间超声心动图中直接回归射血分数(EF),从而实现卓越的性能。然而,端到端的训练管道使学习到的表示变得难以解释。表示也可能无法捕捉超声心动图片段之间的连续关系,表明存在伪相关,这可能对泛化产生负面影响。为了减轻这个问题,我们提出了CoReEcho,一种关注于直接EF回归的连续表示的训练框架。我们的大量实验证明,CoReEcho:1) 在echoNet-Dynamic超声心动图数据集(EchoNet-Dynamic)上超越了当前最先进的水平(SOTA),其均方误差(MAE)为3.90,相关方差(R2)为82.44;2) 提供了稳健且具有更好通用性的特征,在相关下游任务上传递更有效地。代码公开在這個 https URL 上。
https://arxiv.org/abs/2403.10164
Cardiac valve event timing plays a crucial role when conducting clinical measurements using echocardiography. However, established automated approaches are limited by the need of external electrocardiogram sensors, and manual measurements often rely on timing from different cardiac cycles. Recent methods have applied deep learning to cardiac timing, but they have mainly been restricted to only detecting two key time points, namely end-diastole (ED) and end-systole (ES). In this work, we propose a deep learning approach that leverages triplane recordings to enhance detection of valve events in echocardiography. Our method demonstrates improved performance detecting six different events, including valve events conventionally associated with ED and ES. Of all events, we achieve an average absolute frame difference (aFD) of maximum 1.4 frames (29 ms) for start of diastasis, down to 0.6 frames (12 ms) for mitral valve opening when performing a ten-fold cross-validation with test splits on triplane data from 240 patients. On an external independent test consisting of apical long-axis data from 180 other patients, the worst performing event detection had an aFD of 1.8 (30 ms). The proposed approach has the potential to significantly impact clinical practice by enabling more accurate, rapid and comprehensive event detection, leading to improved clinical measurements.
心脏瓣膜事件时序在临床使用超声心动图进行测量时至关重要。然而,已建立的自动方法受到外部心电图传感器的需求限制,而手动测量通常依赖于不同心腔周期的时序。最近,将深度学习应用于心房时序,但它们主要限于仅检测两个关键时间点,即二尖瓣关闭期(ED)和二尖瓣开放期(ES)。在这项工作中,我们提出了一个基于三平面录音的深度学习方法,以增强在超声心动图上检测瓣膜事件的能力。我们的方法在检测六个不同事件方面表现出优异性能,包括与ED和ES conventional associated的瓣膜事件。在所有事件中,我们实现的最大绝对时差(aFD)为1.4帧(29ms),从二尖瓣关闭期到十倍交叉验证时分的平均绝对时差(aFD)为0.6帧(12ms)。对于180名患者的triplane数据,进行十倍交叉验证的测试分割,我们得到的外部独立测试的最差事件检测结果为1.8(30ms)。所提出的方法具有很大的潜力,通过实现更准确、快速和全面的时序检测,显著影响临床实践,从而提高临床测量结果。
https://arxiv.org/abs/2403.10156
In the wake of the global spread of monkeypox, accurate disease recognition has become crucial. This study introduces an improved SE-InceptionV3 model, embedding the SENet module and incorporating L2 regularization into the InceptionV3 framework to enhance monkeypox disease detection. Utilizing the Kaggle monkeypox dataset, which includes images of monkeypox and similar skin conditions, our model demonstrates a noteworthy accuracy of 96.71% on the test set, outperforming conventional methods and deep learning models. The SENet modules channel attention mechanism significantly elevates feature representation, while L2 regularization ensures robust generalization. Extensive experiments validate the models superiority in precision, recall, and F1 score, highlighting its effectiveness in differentiating monkeypox lesions in diverse and complex cases. The study not only provides insights into the application of advanced CNN architectures in medical diagnostics but also opens avenues for further research in model optimization and hyperparameter tuning for enhanced disease recognition. this https URL
在全球猴痘传播的背景下,准确的疾病识别变得至关重要。这项研究引入了一个改进的SE-InceptionV3模型,包括嵌入SENet模块和将L2正则化集成到InceptionV3框架中,以提高猴痘疾病检测的精度。利用Kaggle猴痘数据集,该数据集包括猴痘和其他皮肤病的图像,我们的模型在测试集上的准确率为96.71%,超越了传统方法和深度学习模型。SENet模块的通道注意力机制显著提高了特征表示,而L2正则化确保了鲁棒的泛化能力。大量的实验证实了该模型的精确度、召回率和F1分数的优越性,突出了其在不同复杂情况下区分猴痘病变的有效性。这项研究不仅为医疗诊断中高级CNN架构的应用提供了见解,而且为进一步研究模型优化和超参数调整以提高疾病识别打开了道路。您可以通过以下链接查看该研究:https://www.kaggle.com/intel-health/monkeypox-detection
https://arxiv.org/abs/2403.10087
Shadow removal is a task aimed at erasing regional shadows present in images and reinstating visually pleasing natural scenes with consistent illumination. While recent deep learning techniques have demonstrated impressive performance in image shadow removal, their robustness against adversarial attacks remains largely unexplored. Furthermore, many existing attack frameworks typically allocate a uniform budget for perturbations across the entire input image, which may not be suitable for attacking shadow images. This is primarily due to the unique characteristic of spatially varying illumination within shadow images. In this paper, we propose a novel approach, called shadow-adaptive adversarial attack. Different from standard adversarial attacks, our attack budget is adjusted based on the pixel intensity in different regions of shadow images. Consequently, the optimized adversarial noise in the shadowed regions becomes visually less perceptible while permitting a greater tolerance for perturbations in non-shadow regions. The proposed shadow-adaptive attacks naturally align with the varying illumination distribution in shadow images, resulting in perturbations that are less conspicuous. Building on this, we conduct a comprehensive empirical evaluation of existing shadow removal methods, subjecting them to various levels of attack on publicly available datasets.
去影射是一个旨在消除图像中存在的区域影子,并重新创建具有一致照明且视觉上愉悦的自然场景的任务。虽然最近深度学习技术在图像去影射方面取得了令人印象深刻的性能,但它们对于对抗性攻击的鲁棒性仍然没有得到充分的探索。此外,许多现有的攻击框架通常将整个输入图像的扰动分配均匀的预算,这可能不适合攻击影射图像。这主要是因为影射图像中空间变化照明的独特特征。在本文中,我们提出了一种新颖的方法,称为影射自适应对抗攻击。与标准攻击不同,我们的攻击预算基于不同区域影射图像的像素强度进行调整。因此,在影射区域的优化攻击噪声在视觉上变得不那么显眼,同时允许非影射区域更大的扰动。所提出的影射自适应攻击自然与影射图像中随机的照明分布相一致,导致更少的扰动,且更加不显眼。在此基础上,我们对现有的影射去除方法进行了全面的实证评估,将它们提交到各种公开可用的数据集上进行攻击。
https://arxiv.org/abs/2403.10076
No-reference point cloud quality assessment (NR-PCQA) aims to automatically predict the perceptual quality of point clouds without reference, which has achieved remarkable performance due to the utilization of deep learning-based models. However, these data-driven models suffer from the scarcity of labeled data and perform unsatisfactorily in cross-dataset evaluations. To address this problem, we propose a self-supervised pre-training framework using masked autoencoders (PAME) to help the model learn useful representations without labels. Specifically, after projecting point clouds into images, our PAME employs dual-branch autoencoders, reconstructing masked patches from distorted images into the original patches within reference and distorted images. In this manner, the two branches can separately learn content-aware features and distortion-aware features from the projected images. Furthermore, in the model fine-tuning stage, the learned content-aware features serve as a guide to fuse the point cloud quality features extracted from different perspectives. Extensive experiments show that our method outperforms the state-of-the-art NR-PCQA methods on popular benchmarks in terms of prediction accuracy and generalizability.
No-reference point cloud quality assessment (NR-PCQA) 的目标是自动预测没有参考点的点云的感知质量,由于利用了基于深度学习的模型,已经取得了显著的性能。然而,这些数据驱动的模型由于缺乏标注数据,在跨数据集评估时表现不佳。为了解决这个问题,我们提出了一个自监督的前馈框架,使用遮罩自动编码器(PAME)进行无标签预训练,帮助模型学习无标签的有用表示。具体来说,在将点云投影到图像后,我们的 PAME 采用双分支自动编码器,从扭曲的图像中恢复缺失的补丁,并将其重构为参考和扭曲的原始补丁。这样,两个分支可以从投影的图像中分别学习内容感知特征和扭曲感知特征。此外,在模型微调阶段,学习到的内容感知特征作为将来自不同角度提取的点云质量特征进行融合的指导。大量实验证明,我们的方法在流行基准测试中的预测准确性和泛化性能均优于最先进的 NR-PCQA 方法。
https://arxiv.org/abs/2403.10061
Accurate 2D+T myocardium segmentation in cine cardiac magnetic resonance (CMR) scans is essential to analyze LV motion throughout the cardiac cycle comprehensively. The Segment Anything Model (SAM), known for its accurate segmentation and zero-shot generalization, has not yet been tailored for CMR 2D+T segmentation. We therefore introduce CMR2D+T-SAM, a novel approach to adapt SAM for CMR 2D+T segmentation using spatio-temporal adaption. This approach also incorporates a U-Net framework for multi-scale feature extraction, as well as text prompts for accurate segmentation on both short-axis (SAX) and long-axis (LAX) views using a single model. CMR2D+T-SAM outperforms existing deep learning methods on the STACOM2011 dataset, achieving a myocardium Dice score of 0.885 and a Hausdorff distance (HD) of 2.900 pixels. It also demonstrates superior zero-shot generalization on the ACDC dataset with a Dice score of 0.840 and a HD of 4.076 pixels.
精确的2D+T心室分割在心脏磁共振(CMR)扫描中是全面分析心房收缩期运动的关键。 Segment Anything Model(SAM)因其准确的分割和零散误差而闻名,但尚未专门为CMR 2D+T分割进行优化。因此,我们引入了CMR2D+T-SAM,一种利用时空适应来适应CMR 2D+T分割的新型方法。这种方法还包括多尺度特征提取的U-Net框架以及用于准确分割在 short-axis(SAX)和long-axis(LAX)视图上的文本提示。CMR2D+T-SAM在STACOM2011数据集上优于现有深度学习方法,实现了心室Dice分数为0.885和Hausdorff距离(HD)为2.900像素的卓越性能。在ACDC数据集上,它还具有卓越的零散误差,Dice分数为0.840,HD为4.076像素。
https://arxiv.org/abs/2403.10009
Exploring and modeling rain generation mechanism is critical for augmenting paired data to ease training of rainy image processing models. Against this task, this study proposes a novel deep learning based rain generator, which fully takes the physical generation mechanism underlying rains into consideration and well encodes the learning of the fundamental rain factors (i.e., shape, orientation, length, width and sparsity) explicitly into the deep network. Its significance lies in that the generator not only elaborately design essential elements of the rain to simulate expected rains, like conventional artificial strategies, but also finely adapt to complicated and diverse practical rainy images, like deep learning methods. By rationally adopting filter parameterization technique, we first time achieve a deep network that is finely controllable with respect to rain factors and able to learn the distribution of these factors purely from data. Our unpaired generation experiments demonstrate that the rain generated by the proposed rain generator is not only of higher quality, but also more effective for deraining and downstream tasks compared to current state-of-the-art rain generation methods. Besides, the paired data augmentation experiments, including both in-distribution and out-of-distribution (OOD), further validate the diversity of samples generated by our model for in-distribution deraining and OOD generalization tasks.
探索和建模雨生成机制对于增强成对数据以简化雨图像处理模型的训练至关重要。面对这项任务,本研究提出了一个基于深度学习的全新雨生成器,它全面考虑了雨的物理生成机制,并明确将基本雨因素的学习(即形状、方向、长度、宽度和稀疏度)完全融入到深网络中。它的关键在于,生成器不仅详细设计雨的基本元素以模拟预期降雨,如传统的人工策略,而且还能精细适应复杂和多样化的实际雨图像,如深度学习方法。通过合理的采用滤波器参数化技术,我们首先实现了可以控制雨因素的细粒度可控的深度网络,并能够从数据中学习这些因素的分布。我们的无配对生成实验证明,与当前最先进的雨生成方法相比,所提出的雨生成器生成的降雨不仅质量更高,而且对于去雨和下游任务的效果也更为显著。此外,成对数据增强实验(包括内部和外部分布,OOD)进一步验证了我们的模型为内部去雨和外部去雨任务提供的样本多样性。
https://arxiv.org/abs/2403.09993
Brain tumors remain a critical global health challenge, necessitating advancements in diagnostic techniques and treatment methodologies. A tumor or its recurrence often needs to be identified in imaging studies and differentiated from normal brain tissue. In response to the growing need for age-specific segmentation models, particularly for pediatric patients, this study explores the deployment of deep learning techniques using magnetic resonance imaging (MRI) modalities. By introducing a novel ensemble approach using ONet and modified versions of UNet, coupled with innovative loss functions, this study achieves a precise segmentation model for the BraTS-PEDs 2023 Challenge. Data augmentation, including both single and composite transformations, ensures model robustness and accuracy across different scanning protocols. The ensemble strategy, integrating the ONet and UNet models, shows greater effectiveness in capturing specific features and modeling diverse aspects of the MRI images which result in lesion wise Dice scores of 0.52, 0.72 and 0.78 on unseen validation data and scores of 0.55, 0.70, 0.79 on final testing data for the "enhancing tumor", "tumor core" and "whole tumor" labels respectively. Visual comparisons further confirm the superiority of the ensemble method in accurate tumor region coverage. The results indicate that this advanced ensemble approach, building upon the unique strengths of individual models, offers promising prospects for enhanced diagnostic accuracy and effective treatment planning and monitoring for brain tumors in pediatric brains.
脑肿瘤仍然是全球健康的一个关键挑战,需要改进诊断技术和治疗方法。肿瘤或其复发通常需要在影像学研究和鉴别诊断正常脑组织后确定。为了满足对年龄特定分割模型的不断增长需求,特别是对儿科患者,这项研究探讨了使用磁共振成像(MRI)方式部署深度学习技术。通过引入一种新颖的集成方法,使用ONet及其修改的UNet模型,并结合创新损失函数,这项研究在 BraTS-PEDs 2023 挑战中实现了精确分割模型。数据增强,包括单和组合变换,确保了模型在不同扫描协议下的鲁棒性和准确性。集成策略,结合ONet和UNet模型,在捕获特定特征并建模多样方面取得了更大的效果。在未见过的验证数据上,肿瘤的Dice得分分别为0.52、0.72和0.78,而在“增强肿瘤”、“肿瘤核心”和“整个肿瘤”标签上的最终测试数据上,分别为0.55、0.70和0.79。视觉比较进一步证实了集成方法在准确肿瘤区域覆盖方面的优越性。结果表明,这种基于个体模型独特优势的先进集成方法为提高脑肿瘤在儿科患者中的诊断准确性和有效的治疗计划和监测带来了有前景的方向。
https://arxiv.org/abs/2308.07212
Conventional imaging diagnostics frequently encounter bottlenecks due to manual inspection, which can lead to delays and inconsistencies. Although deep learning offers a pathway to automation and enhanced accuracy, foundational models in computer vision often emphasize global context at the expense of local details, which are vital for medical imaging diagnostics. To address this, we harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. This method ensures that local features are not only preserved but are also enriched with task-specific information, enhancing their relevance and detail at every hierarchical level. By implementing this strategy, our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification. These results highlight our approach's effectiveness and its promising implications for the future of medical imaging diagnostics. Our implementation is available on this https URL
常规的影像诊断检查常常因为手动检查而遇到瓶颈,这可能导致延迟和不一致性。尽管深度学习提供了一种自动化和提高准确性的途径,但计算机视觉的基础模型通常强调全局上下文,而忽视了局部细节,这些细节对医疗影像诊断至关重要。为了解决这个问题,我们利用Swin Transformer在图像中识别扩展空间依赖性的能力,通过分层框架。我们的新贡献在于改进局部特征表示,将它们专门指向分类器的最终分布。这种方法确保了局部特征不仅得以保留,而且还得到了与任务相关的信息的丰富,从而提高了它们在每一层的重要性。通过实现这一策略,我们的模型在充分验证两个成熟基准测试(Knee OsteoArthritis,KOA)分类器方面表现出了显著的稳健性和精度。这些结果突出了我们方法的有效性和其对医疗影像诊断未来发展的潜在影响。我们的实现可通过此链接访问:
https://arxiv.org/abs/2403.09947
In the healthcare domain, Magnetic Resonance Imaging (MRI) assumes a pivotal role, as it employs Artificial Intelligence (AI) and Machine Learning (ML) methodologies to extract invaluable insights from imaging data. Nonetheless, the imperative need for patient privacy poses significant challenges when collecting data from diverse healthcare sources. Consequently, the Deep Learning (DL) communities occasionally face difficulties detecting rare features. In this research endeavor, we introduce the Ensemble-Based Federated Learning (EBFL) Framework, an innovative solution tailored to address this challenge. The EBFL framework deviates from the conventional approach by emphasizing model features over sharing sensitive patient data. This unique methodology fosters a collaborative and privacy-conscious environment for healthcare institutions, empowering them to harness the capabilities of a centralized server for model refinement while upholding the utmost data privacy standards.Conversely, a robust ensemble architecture boasts potent feature extraction capabilities, distinguishing itself from a single DL model. This quality makes it remarkably dependable for MRI analysis. By harnessing our groundbreaking EBFL methodology, we have achieved remarkable precision in the classification of brain tumors, including glioma, meningioma, pituitary, and non-tumor instances, attaining a precision rate of 94% for the Global model and an impressive 96% for the Ensemble model. Our models underwent rigorous evaluation using conventional performance metrics such as Accuracy, Precision, Recall, and F1 Score. Integrating DL within the Federated Learning (FL) framework has yielded a methodology that offers precise and dependable diagnostics for detecting brain tumors.
在医疗领域,Magnetic Resonance Imaging(MRI)发挥着重要作用,因为它采用人工智能(AI)和机器学习(ML)方法来从影像数据中提取宝贵的洞见。然而,从多样化的医疗来源收集数据对保护患者隐私的需求带来了巨大的挑战。因此,深度学习(DL)社区偶尔在检测罕见特征时遇到困难。在这项研究任务中,我们引入了基于集成学习的联邦学习(EBFL)框架,这是一种专为解决这一挑战而设计的创新解决方案。 EBFL框架与传统方法不同,它强调了模型特征而不是共享敏感患者数据。这种独特的方法促进了医疗机构之间的协作和隐私意识,使它们能够利用集中的服务器的模型精炼能力,同时遵守最高的数据隐私标准。 相反,一个强大的集成架构具有强大的特征提取能力,区别于单一的DL模型。这种质量使得它在MRI分析中具有非常可靠的依赖性。通过利用我们引人注目的EBFL方法,我们在脑肿瘤的分类包括胶质瘤、脉络丛脑瘤、垂体和无肿瘤实例方面取得了极高的精确率。我们的模型通过传统的性能度量方式如准确率、精确率、召回率和F1分数进行了严格的评估。将DL集成到联邦学习框架中产生了一种可精确可靠地检测脑肿瘤的方法。
https://arxiv.org/abs/2403.09836