Self-supervised contrastive learning has emerged as one of the most successful deep learning paradigms. In this regard, it has seen extensive use in image registration and, more recently, in the particular field of medical image registration. In this work, we propose to test and extend and improve a state-of-the-art framework for color fundus image registration, ConKeD. Using the ConKeD framework we test multiple loss functions, adapting them to the framework and the application domain. Furthermore, we evaluate our models using the standarized benchmark dataset FIRE as well as several datasets that have never been used before for color fundus registration, for which we are releasing the pairing data as well as a standardized evaluation approach. Our work demonstrates state-of-the-art performance across all datasets and metrics demonstrating several advantages over current SOTA color fundus registration methods
自监督对比学习已经成为最成功的深度学习范式之一。在这方面,它在图像配准和更近期的医学图像配准领域看到了广泛的应用。在这项工作中,我们提出了一个用于测试和改进最先进的颜色 fundus 图像配准框架ConKeD的框架。使用ConKeD框架我们测试了多个损失函数,并将其适应框架和应用领域。此外,我们还使用标准化基准数据集FIRE以及之前没有用于颜色 fundus 图像配准的数据集来评估我们的模型。我们的工作在所有数据集和指标上都展示了当前最佳性能,并比当前最佳方法具有几个优势。
https://arxiv.org/abs/2404.16773
Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce an entropy-based method to identify and filter out unanswerable results. We further enhance result quality by filtering low-confidence SQL through log probability-based distribution, while grammatical and schema errors are mitigated by executing queries on the actual database. We experimentally verified that our method can filter unanswerable questions, which can be widely utilized even when the parameters of the model are not accessible, and that it can be effectively utilized in practice.
近年来,基于深度学习的语言模型在文本到数据库任务上取得了显著的提高,在医疗领域有广泛的应用,如检索病历记录。在这种应用中,一个值得注意的是区分不可回答的问题。通过微调模型,我们证明了将医疗记录查询转换为SQL查询是可能的。此外,我们还引入了一种基于熵的方法来识别和过滤不可回答的结果。通过基于日志概率分布过滤低置信度的SQL,我们进一步提高了结果的质量。通过在实际数据库上执行查询来过滤低置信度的SQL,我们可以缓解语义和模式错误。我们通过实验验证,我们的方法可以过滤不可回答的问题,即使模型的参数不可用,实际应用中也可以广泛使用,而且在实践中取得了良好的效果。
https://arxiv.org/abs/2404.16659
In below freezing winter conditions, road surface friction can greatly vary based on the mixture of snow, ice, and water on the road. Friction between the road and vehicle tyres is a critical parameter defining vehicle dynamics, and therefore road surface friction information is essential to acquire for several intelligent transportation applications, such as safe control of automated vehicles or alerting drivers of slippery road conditions. This paper explores computer vision-based evaluation of road surface friction from roadside cameras. Previous studies have extensively investigated the application of convolutional neural networks for the task of evaluating the road surface condition from images. Here, we propose a hybrid deep learning architecture, WCamNet, consisting of a pretrained visual transformer model and convolutional blocks. The motivation of the architecture is to combine general visual features provided by the transformer model, as well as finetuned feature extraction properties of the convolutional blocks. To benchmark the approach, an extensive dataset was gathered from national Finnish road infrastructure network of roadside cameras and optical road surface friction sensors. Acquired results highlight that the proposed WCamNet outperforms previous approaches in the task of predicting the road surface friction from the roadside camera images.
在严寒的冬季条件下,道路表面的摩擦系数会因路面上积雪、冰和水混合物的影响而大大不同。道路与车辆轮胎之间的摩擦是定义车辆动力学的重要参数,因此获取道路表面摩擦信息对于多个智能交通应用至关重要,例如安全控制自动车辆或警示驾驶员道路湿滑情况。本文从路边摄像机对道路表面摩擦进行计算机视觉评估。之前的研究已经广泛探讨了使用卷积神经网络从图像中评估道路表面状况。本文提出了一种混合深度学习架构WCamNet,包括预训练的视觉 transformer模型和卷积模块。架构的动机是结合 transformer 模型提供的通用视觉特征以及卷积模块的微调特征提取特性。为了验证该方法,从国家芬兰道路基础设施网络的路边摄像机和光学道路表面摩擦传感器中收集了广泛的數據。得到的结果表明,与之前的方法相比,所提出的 WCamNet 在预测从路边摄像机图像中预测道路表面摩擦方面表现优异。
https://arxiv.org/abs/2404.16578
In recent years, with the rapid development of computer information technology, the development of artificial intelligence has been accelerating. The traditional geometry recognition technology is relatively backward and the recognition rate is low. In the face of massive information database, the traditional algorithm model inevitably has the problems of low recognition accuracy and poor performance. Deep learning theory has gradually become a very important part of machine learning. The implementation of convolutional neural network (CNN) reduces the difficulty of graphics generation algorithm. In this paper, using the advantages of lenet-5 architecture sharing weights and feature extraction and classification, the proposed geometric pattern recognition algorithm model is faster in the training data set. By constructing the shared feature parameters of the algorithm model, the cross-entropy loss function is used in the recognition process to improve the generalization of the model and improve the average recognition accuracy of the test data set.
近年来,随着计算机信息技术的快速发展,人工智能的发展也加速了。传统的几何识别技术相对较落后,识别率也较低。面对大规模的信息数据库,传统的算法模型无疑存在识别准确度低和性能差的问题。深度学习理论逐渐成为机器学习的重要组成部分。卷积神经网络(CNN)的实现减化了图形生成算法的难度。在本文中,利用lenet-5架构共享权重和特征提取与分类的优势,所提出的几何模式识别算法模型在训练数据集上训练速度更快。通过构建算法模型的共享特征参数,交叉熵损失函数在识别过程中用于提高模型的泛化能力和测试数据集的平均识别准确度。
https://arxiv.org/abs/2404.16561
Scour around bridge piers is a critical challenge for infrastructures around the world. In the absence of analytical models and due to the complexity of the scour process, it is difficult for current empirical methods to achieve accurate predictions. In this paper, we exploit the power of deep learning algorithms to forecast the scour depth variations around bridge piers based on historical sensor monitoring data, including riverbed elevation, flow elevation, and flow velocity. We investigated the performance of Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) models for real-time scour forecasting using data collected from bridges in Alaska and Oregon from 2006 to 2021. The LSTM models achieved mean absolute error (MAE) ranging from 0.1m to 0.5m for predicting bed level variations a week in advance, showing a reasonable performance. The Fully Convolutional Network (FCN) variant of CNN outperformed other CNN configurations, showing a comparable performance to LSTMs with significantly lower computational costs. We explored various innovative random-search heuristics for hyperparameter tuning and model optimisation which resulted in reduced computational cost compared to grid-search method. The impact of different combinations of sensor features on scour prediction showed the significance of the historical time series of scour for predicting upcoming events. Overall, this study provides a greater understanding of the potential of Deep Learning (DL) for real-time scour forecasting and early warning in bridges with diverse scour and flow characteristics including riverine and tidal/coastal bridges.
在世界各地的基础设施中,清理桥墩是一个关键的挑战。缺乏分析模型以及由于侵蚀过程的复杂性,当前的实证方法很难实现准确的预测。在本文中,我们利用深度学习算法的优势来预测基于历史传感器监测数据桥墩周围的侵蚀深度变化,包括河床高度、流速和流深。我们还研究了使用2006年至2021年阿拉斯加和俄勒冈州桥梁收集的数据来预测实时侵蚀预测的LSTM和卷积神经网络模型的性能。LSTM模型的预测床面变化平均绝对误差(MAE)在提前一周预测时从0.1米到0.5米,表现出相当不错的性能。全卷积网络(FCN)变体在其他CNN配置中表现优异,与LSTM模型的性能相当,但计算成本较低。我们研究了各种创新随机搜索策略进行超参数调整和模型优化,从而使计算成本比网格搜索方法降低。不同传感器特征组合对侵蚀预测的影响表明了历史侵蚀时间序列对于预测即将发生事件的显著性。总体而言,本研究为深入理解DL在具有多样scour和flow特性的桥梁上的实时侵蚀预测和预警提供了更大的认识。
https://arxiv.org/abs/2404.16549
In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
在本文中,我们提出了一种新的方法来解决自动驾驶感知系统中3D物体检测的问题。我们的方法基于最近在深度学习方面的进展,并利用两个传感器的优势来提高物体检测性能。具体来说,我们使用最先进的深度学习架构提取相机图像的2D特征,然后应用一种新颖的跨域空间匹配(CDSM)变换方法将它们转换为3D空间。接着,我们使用互补的融合策略将提取的雷达数据与2D特征融合,产生最终的3D物体表示。为了证明我们方法的有效性,我们在 nuScenes 数据集上进行了评估。我们将我们的方法与单传感器性能和当前最先进的融合方法进行了比较。我们的结果表明,与单传感器解决方案相比,所提出的方法具有卓越的性能,并可以直接与其他顶级融合方法竞争。
https://arxiv.org/abs/2404.16548
In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metrics that effectively capture the model's robustness are needed. To address this issue, we compare the rigour and usage conditions of various assessment methods based on different definitions. Then, we propose a straightforward and practical metric utilizing hypothesis testing for probabilistic robustness and have integrated it into the TorchAttacks library. Through a comparative analysis of diverse robustness assessment methods, our approach contributes to a deeper understanding of model robustness in safety-critical applications.
在深度学习应用中,稳健性测量处理输入数据微小变化的能力,可能导致潜在的安全风险,特别是在关键安全应用中。对模型稳健性的预部署评估至关重要,但现有方法通常存在成本高或结果不精确的问题。为了提高现实场景中的安全性,需要有效的指标来捕捉模型的稳健性。为了解决这个问题,我们根据不同的定义比较了各种评估方法的严谨性和使用条件。然后,我们提出了一个简单而实际的概率鲁棒性指标,并将其集成到TorchAttacks库中。通过比较不同鲁棒性评估方法的比较分析,我们的方法为关键安全应用中模型的稳健性提供了更深入的理解。
https://arxiv.org/abs/2404.16457
Cell tracking remains a pivotal yet challenging task in biomedical research. The full potential of deep learning for this purpose is often untapped due to the limited availability of comprehensive and varied training data sets. In this paper, we present SynCellFactory, a generative cell video augmentation. At the heart of SynCellFactory lies the ControlNet architecture, which has been fine-tuned to synthesize cell imagery with photorealistic accuracy in style and motion patterns. This technique enables the creation of synthetic yet realistic cell videos that mirror the complexity of authentic microscopy time-lapses. Our experiments demonstrate that SynCellFactory boosts the performance of well-established deep learning models for cell tracking, particularly when original training data is sparse.
细胞追踪在生物医学研究中仍然是一个关键但具有挑战性的任务。由于深度学习在为此目的的全面且多样化的训练数据集的可用性方面往往被低估,因此深度学习在此任务上的全部潜力常常未被充分利用。在本文中,我们提出了SynCellFactory,一种生成细胞视频的增强方法。SynCellFactory的核心是ControlNet架构,该架构已通过在风格和运动模式上合成细胞图像来提高其准确性。这种技术能够创建与真实显微镜时间间隔复杂性相仿的合成细胞视频。我们的实验结果表明,SynCellFactory能够显著提高已有的深度学习模型在细胞追踪方面的性能,特别是当原始训练数据稀疏时。
https://arxiv.org/abs/2404.16421
The interactions between tumor cells and the tumor microenvironment (TME) dictate therapeutic efficacy of radiation and many systemic therapies in breast cancer. However, to date, there is not a widely available method to reproducibly measure tumor and immune phenotypes for each patient's tumor. Given this unmet clinical need, we applied multiple instance learning (MIL) algorithms to assess activity of ten biologically relevant pathways from the hematoxylin and eosin (H&E) slide of primary breast tumors. We employed different feature extraction approaches and state-of-the-art model architectures. Using binary classification, our models attained area under the receiver operating characteristic (AUROC) scores above 0.70 for nearly all gene expression pathways and on some cases, exceeded 0.80. Attention maps suggest that our trained models recognize biologically relevant spatial patterns of cell sub-populations from H&E. These efforts represent a first step towards developing computational H&E biomarkers that reflect facets of the TME and hold promise for augmenting precision oncology.
肿瘤细胞与肿瘤微环境(TME)之间的相互作用决定了放射治疗和许多系统治疗在乳腺癌中的治疗效果。然而,目前还没有一种可重复测量每个患者肿瘤的肿瘤和免疫表型的广泛可用方法。鉴于这一未满足的临床需求,我们将多实例学习(MIL)算法应用于从原始乳腺癌的哈希和电子显微镜(H&E)切片评估十种生物相关的通路的活动。我们采用了不同的特征提取方法和最先进的模型架构。使用二分类,我们的模型在几乎所有基因表达通路上的接收者操作特征(AUROC)分数都超过了0.70,在某些情况下甚至超过了0.80。注意力图表明,经过训练的模型能够识别H&E中的细胞亚群的空间模式。这些努力代表了解决计算H&E生物标志物的第一步,这些生物标志物可以反映TME的方面,并具有提高精准癌症治疗的精度的潜力。
https://arxiv.org/abs/2404.16397
Deep convolutional neural networks (DCNNs) are a class of artificial neural networks, primarily for computer vision tasks such as segmentation and classification. Many nonlinear operations, such as activation functions and pooling strategies, are used in DCNNs to enhance their ability to process different signals with different tasks. Conceptional convolution, a linear filter, is the essential component of DCNNs while nonlinear convolution is generally implemented as higher-order Volterra filters, However, for Volterra filtering, significant memory and computational costs pose a primary limitation for its widespread application in DCNN applications. In this study, we propose a novel method to perform higher-order Volterra filtering with lower memory and computation cost in forward and backward pass in DCNN training. The proposed method demonstrates computational advantages compared with conventional Volterra filter implementation. Furthermore, based on the proposed method, a new attention module called Higher-order Local Attention Block (HLA) is proposed and tested on CIFAR-100 dataset, which shows competitive improvement for classification task. Source code is available at: this https URL
深度卷积神经网络(DCNNs)是一种用于计算机视觉任务(如分割和分类)的人工神经网络。许多非线性操作,如激活函数和池化策略,用于增强DCNNs处理不同任务的能力。概念上的卷积是DCNNs的关键组成部分,而通常非线性卷积通过高阶Volterra滤波器实现。然而,对于Volterra滤波器,在DCNN应用中广泛应用的记忆和计算成本方面的限制尤为突出。在这项研究中,我们提出了一种在DCNN训练过程中实现高阶Volterra滤波且具有较低记忆和计算成本的新方法。与传统Volterra滤波器实现相比,该方法具有计算优势。此外,根据所提出的方法,还提出并测试了一个名为 Higher-order Local Attention Block (HLA) 的自注意力模块,在CIFAR-100 数据集上进行了分类任务的测试,其分类性能具有竞争力的提升。源代码可在此处访问:https://this URL
https://arxiv.org/abs/2404.16380
Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.
尽管在医学影像分析中深度学习的成功已经让人印象深刻,但由于高质量 labeled 图像的稀缺性,医学图像分割仍然具有挑战性。此外,自然图像和医学图像以及超声图像之间显著的领域差距会阻碍将基于自然图像训练的模型用于当前任务的微调。在这项工作中,我们解决了在低数据 regime 下分割模型的性能下降问题,并提出了一个无需提示的分割方法,利用分割基础模型的能力对抽象形状进行分割。我们通过使用粗粒度语义分割掩码作为输入和零散提示可优化目标来实现这一目标。我们在超声图像数据集上展示了我们的方法。在小的多关节超声图像数据集上进行低数据 regime 实验,各种低数据 regime 实验都表明,随着训练集大小的减小,性能提高。
https://arxiv.org/abs/2404.16325
This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed methods must process 30 FHD frames under 1 second. In the challenge, a total of 102 participants registered, and 15 submitted code and models. The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
本文回顾了 AIS 2024 视频质量评估(VQA)挑战,重点关注用户生成内容(UGC)。这一挑战的目标是收集基于深度学习的估算 UGC 视频感知质量的方法。来自 YouTube UGC 数据集的用户生成视频包括各种内容(体育、游戏、歌词、动漫等),质量和分辨率。所提出的方法必须在 1 秒内处理 30 FHD 帧。在挑战中,共有 102 名参与者注册,其中 15 名提交了代码和模型。对前五名提交者的性能进行了审查,并提供了一个调查不同深度模型用于有效评估用户生成内容视频质量的调查结果。
https://arxiv.org/abs/2404.16205
Analyzing volumetric data with rotational invariance or equivariance is an active topic in current research. Existing deep-learning approaches utilize either group convolutional networks limited to discrete rotations or steerable convolutional networks with constrained filter structures. This work proposes a novel equivariant neural network architecture that achieves analytical Equivariance to Local Pattern Orientation on the continuous SO(3) group while allowing unconstrained trainable filters - EquiLoPO Network. Our key innovations are a group convolutional operation leveraging irreducible representations as the Fourier basis and a local activation function in the SO(3) space that provides a well-defined mapping from input to output functions, preserving equivariance. By integrating these operations into a ResNet-style architecture, we propose a model that overcomes the limitations of prior methods. A comprehensive evaluation on diverse 3D medical imaging datasets from MedMNIST3D demonstrates the effectiveness of our approach, which consistently outperforms state of the art. This work suggests the benefits of true rotational equivariance on SO(3) and flexible unconstrained filters enabled by the local activation function, providing a flexible framework for equivariant deep learning on volumetric data with potential applications across domains. Our code is publicly available at \url{this https URL}.
分析体积数据具有旋转不变性或等价性是当前研究的一个活跃主题。现有的深度学习方法要么是有限离散旋转的组卷积网络,要么是具有约束滤波器结构的可调节卷积网络。本文提出了一种新颖的等价神经网络架构,可以在连续SO(3)组上实现对局部模式方向的分析等价性,同时允许无约束的训练滤波器 - EquiLoPO网络。我们的关键创新点是一个利用不可约表示作为傅里叶基的组卷积操作,以及SO(3)空间中提供输入到输出函数的良好定义的局部激活函数。通过将这些操作整合到ResNet风格的架构中,我们提出了一个克服了先前方法局限性的模型。对MedMNIST3D等多样3D医疗成像数据集的全面评估表明,我们的方法的有效性得到了充分证明,该方法 consistently超越了最先进的技术水平。这项工作揭示了真旋转等价性对SO(3)的益处以及由局部激活函数实现的可伸缩和不约束滤波器,为在体积数据上实现等价深度学习提供了灵活的框架,具有广泛的应用前景。我们的代码公开可用,通过点击以下链接访问:https:// this https URL。
https://arxiv.org/abs/2404.15979
Patellofemoral joint (PFJ) issues affect one in four people, with 20% experiencing chronic knee pain despite treatment. Poor outcomes and pain after knee replacement surgery are often linked to patellar mal-tracking. Traditional imaging methods like CT and MRI face challenges, including cost and metal artefacts, and there's currently no ideal way to observe joint motion without issues such as soft tissue artefacts or radiation exposure. A new system to monitor joint motion could significantly improve understanding of PFJ dynamics, aiding in better patient care and outcomes. Combining 2D ultrasound with motion tracking for 3D reconstruction of the joint using semantic segmentation and position registration can be a solution. However, the need for expensive external infrastructure to estimate the trajectories of the scanner remains the main limitation to implementing 3D bone reconstruction from handheld ultrasound scanning clinically. We proposed the Visual-Inertial Odometry (VIO) and the deep learning-based inertial-only odometry methods as alternatives to motion capture for tracking a handheld ultrasound scanner. The 3D reconstruction generated by these methods has demonstrated potential for assessing the PFJ and for further measurements from free-hand ultrasound scans. The results show that the VIO method performs as well as the motion capture method, with average reconstruction errors of 1.25 mm and 1.21 mm, respectively. The VIO method is the first infrastructure-free method for 3D reconstruction of bone from wireless handheld ultrasound scanning with an accuracy comparable to methods that require external infrastructure.
翻译:Patellofemoral joint (PFJ) 问题影响四分之一的人,即使经过治疗,20%的人仍然会经历慢性膝盖疼痛。腿部置换手术后的不良结果和疼痛通常与膝关节不良运动有关。传统的影像技术如 CT 和 MRI 面临成本和金属伪影等挑战,目前没有理想的方法在没有软组织伪影或辐射暴露等问题的情况下观察关节运动。一种新系统监测关节运动可能显著改善对 PFJ 动态的理解,有助于提高患者护理和治疗效果。将 2D 超声与运动跟踪结合进行关节三维重建可以使用语义分割和位置配准,可能是解决方案。然而,需要昂贵的外部基础设施估计扫描器的轨迹仍然是实施临床超声三维骨重建的主要限制。我们提出了视觉惯性测量 (VIO) 和基于深度学习的惯性仅运动跟踪方法作为手持超声扫描器的运动捕捉替代方法。这些方法产生的 3D 重建已经证明了评估 PFJ 的潜力和从自由手超声扫描中进行进一步测量的可能性。结果表明,VIO 方法与运动捕捉方法的表现相同,平均重建误差分别为 1.25 mm 和 1.21 mm。VIO 方法是第一个无基础设施免费的 3D 骨重建方法,其准确性相当于需要外部基础设施的方法。
https://arxiv.org/abs/2404.15847
The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code will be released soon.
深度学习在临床实践中集成往往受到基于有限和异质医疗数据集的挑战的阻碍。此外,在关注几个狭窄的基准上优先改善边缘性能的度量导致在临床应用上的实质性算法进步减缓。这种趋势通常导致在现有方法上进行过度的微调,以在选定的数据集上实现最先进的性能,而不是促进与临床相关的创新。因此,本文提出了一个全面的基准,为 MedMNIST+ 数据库提供多样性,对常见的卷积神经网络(CNN)和基于 Transformer 的架构进行深入分析,以提高医学图像分类的临床相关性。我们的评估包括各种医疗数据集、训练方法和技术,旨在重新评估广泛使用的模型变体。我们的研究结果表明,计算高效的训练方案和现代基础模型有望弥合昂贵端到端训练和更精简的资源优化方法之间的差距。此外,与普遍假设相反,我们观察到,在某些阈值以上,更高的分辨率并不一定改善性能,我们主张在原型阶段使用较低的分辨率,特别是加快处理速度。值得注意的是,我们的分析证实了卷积模型相对于基于 ViT 的架构具有竞争力,突出了理解不同模型架构的固有能力的的重要性。此外,我们希望,我们的标准化评估框架将有助于增强 MedMNIST+ 数据集收集的透明度、可重复性和可比性,同时提高该领域未来的研究水平。代码即将发布。
https://arxiv.org/abs/2404.15786
Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated.
自动驾驶车辆(AVs)对环境理解和导航重度依赖激光雷达感知。激光雷达强度提供了关于反射激光信号的有价值的信息,并在增强AV的感知能力中发挥了关键作用。然而,准确模拟激光雷达强度仍然是一个挑战,由于环境中物体的材料性质不可用,以及激光束与环境的复杂相互作用。所提出的方法旨在通过在深度学习框架中引入基于物理的模态来提高强度模拟的准确性。一个捕捉激光束与物体之间互动的关键实体是入射角。在本文中,我们证明了将激光雷达入射角作为额外的输入到深度神经网络可以显著增强结果。我们比较了两个著名的深度学习架构:U-NET和Pix2Pix。我们将这两个架构用于强度预测任务,并使用SemanticKITTI和VoxelScape数据集进行实验。比较分析揭示了,这两个架构都从入射角作为额外的输入受益。此外,Pix2Pix架构在纳入入射角时优于U-NET。
https://arxiv.org/abs/2404.15774
Modular deep learning is the state-of-the-art solution for lifting the curse of multilinguality, preventing the impact of negative interference and enabling cross-lingual performance in Multilingual Pre-trained Language Models. However, a trade-off of this approach is the reduction in positive transfer learning from closely related languages. In response, we introduce a novel method called language arithmetic, which enables training-free post-processing to address this limitation. Inspired by the task arithmetic framework, we apply learning via addition to the language adapters, transitioning the framework from a multi-task to a multilingual setup. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes, acting as a post-processing procedure. Language arithmetic consistently improves the baselines with significant gains in the most challenging cases of zero-shot and low-resource applications. Our code and models are available at this https URL .
模块化深度学习是解决多语言问题的最先进解决方案,可以防止负干扰的影响,并实现跨语言性能。然而,这种方法的一个代价是减少了与相关语言的积极迁移。为了应对这个局限性,我们引入了一种名为语言代数的新方法,它允许无训练的后处理来解决这个问题。受到任务代数框架的启发,我们在语言适配器上进行加法训练,将框架从多任务设置转变为多语言环境。所提出解决方案在基于MAD-X的跨语言方案中的三个下游任务上的有效性得到了说明,充当了一个后处理过程。语言代数在最具挑战性的零 shot 和低资源应用中取得了显著的提高。我们的代码和模型可在此处访问:https://url.com/ 。
https://arxiv.org/abs/2404.15737
Despite considerable strides in developing deep learning models for 3D medical image segmentation, the challenge of effectively generalizing across diverse image distributions persists. While domain generalization is acknowledged as vital for robust application in clinical settings, the challenges stemming from training with a limited Field of View (FOV) remain unaddressed. This limitation leads to false predictions when applied to body regions beyond the FOV of the training data. In response to this problem, we propose a novel loss function that penalizes predictions in implausible body regions, applicable in both single-dataset and multi-dataset training schemes. It is realized with a Body Part Regression model that generates axial slice positional scores. Through comprehensive evaluation using a test set featuring varying FOVs, our approach demonstrates remarkable improvements in generalization capabilities. It effectively mitigates false positive tumor predictions up to 85% and significantly enhances overall segmentation performance.
尽管在发展用于3D医疗图像分割的深度学习模型方面取得了显著的进步,但跨不同图像分布有效地进行泛化仍然具有挑战性。虽然领域泛化被认为是临床设置中稳健应用的关键,但训练受限视野(FOV)所带来的挑战仍未得到解决。这种局限导致在训练数据范围之外的身体区域应用时出现错误预测。为了解决这个问题,我们提出了一个新颖的损失函数,适用于单数据集和多数据集训练模式。通过使用身体部分回归模型生成轴向切片位置评分,在全面评估测试集中,我们的方法在泛化能力方面取得了显著的改进。它有效地将错误预测肿瘤降低至85%,并显著增强了整体分割性能。
https://arxiv.org/abs/2404.15718
Deepfakes, synthetic images generated by deep learning algorithms, represent one of the biggest challenges in the field of Digital Forensics. The scientific community is working to develop approaches that can discriminate the origin of digital images (real or AI-generated). However, these methodologies face the challenge of generalization, that is, the ability to discern the nature of an image even if it is generated by an architecture not seen during training. This usually leads to a drop in performance. In this context, we propose a novel approach based on three blocks called Base Models, each of which is responsible for extracting the discriminative features of a specific image class (Diffusion Model-generated, GAN-generated, or real) as it is trained by exploiting deliberately unbalanced datasets. The features extracted from each block are then concatenated and processed to discriminate the origin of the input image. Experimental results showed that this approach not only demonstrates good robust capabilities to JPEG compression but also outperforms state-of-the-art methods in several generalization tests. Code, models and dataset are available at this https URL.
Deepfakes,由深度学习算法生成的合成图像,是数字取证领域的一个巨大挑战。科学界正在努力开发方法来区分数字图像(真实或由AI生成)的来源。然而,这些方法面临着泛化能力的挑战,即,即使是在训练过程中没有见过的架构中生成图像,也能辨别出图像的性质。通常会导致性能下降。在这种情况下,我们提出了一个基于三个模块的新颖方法,称为基本模型,每个模块负责从训练过程中提取特定图像类别的区分性特征(扩散模型生成,GAN生成或真实)并利用故意不平衡的数据集。从每个模块提取的特征然后进行连接和处理,以区分输入图像的来源。实验结果表明,这种方法不仅表现出对JPEG压缩的良好容错性,而且在几个通用测试中超过了最先进的方法。代码,模型和数据集都可以在上述链接中找到。
https://arxiv.org/abs/2404.15697
Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep learning excels in learning multi-level feature spaces, but they often lack explicit coding of part-whole relations, a prominent property of medical imaging. To overcome this limitation, we introduce Adam-v2, a new self-supervised learning framework extending Adam [79] by explicitly incorporating part-whole hierarchies into its learning objectives through three key branches: (1) Localizability, acquiring discriminative representations to distinguish different anatomical patterns; (2) Composability, learning each anatomical structure in a parts-to-whole manner; and (3) Decomposability, comprehending each anatomical structure in a whole-to-parts manner. Experimental results across 10 tasks, compared to 11 baselines in zero-shot, few-shot transfer, and full fine-tuning settings, showcase Adam-v2's superior performance over large-scale medical models and existing SSL methods across diverse downstream tasks. The higher generality and robustness of Adam-v2's representations originate from its explicit construction of hierarchies for distinct anatomical structures from unlabeled medical images. Adam-v2 preserves a semantic balance of anatomical diversity and harmony in its embedding, yielding representations that are both generic and semantically meaningful, yet overlooked in existing SSL methods. All code and pretrained models are available at this https URL.
人类轻松地通过解析图像将其分解成部分-整体层次结构;深度学习在多级特征空间中表现出色,但他们通常缺乏对部分-整体关系的明确编码,这是医学成像的一个突出特点。为了克服这个局限性,我们引入了Adam-v2,一种在Adam [79]的基础上引入部分-整体层次结构的自监督学习框架。通过三个关键分支:(1)局部可解释性,获得区分不同解剖模式的可鉴别表示;(2)可组合性,以部分-整体的方式学习每个解剖结构;(3)可分解性,以整体-部分的方式理解每个解剖结构。在10个任务中的实验结果与11个基线在零散转移、少散转移和完整微调设置中的表现进行了比较,结果表明Adam-v2在大型医疗模型和现有SSL方法方面表现出色。Adam-v2表示的语义平衡和稳健性源于其对不同解剖结构从无标签医学图像中显式构建层次结构。Adam-v2保留了解剖多样性与和谐的语言平衡,使其嵌入具有既通用又具有语义意义的表示,然而在现有SSL方法中被忽视。所有代码和预训练模型都可以在https://这个链接中找到。
https://arxiv.org/abs/2404.15672