Deep_Learning

Ti-Patch: Tiled Physical Adversarial Patch for no-reference video quality metrics

2024-04-15 17:38:47

Victoria Leonenkova, Ekaterina Shumitskaya, Anastasia Antsiferova, Dmitriy Vatolin

arXiv_CV

arXiv_CV Deep_Learning Adversarial Knowledge Pose
Abstract

Objective no-reference image- and video-quality metrics are crucial in many computer vision tasks. However, state-of-the-art no-reference metrics have become learning-based and are vulnerable to adversarial attacks. The vulnerability of quality metrics imposes restrictions on using such metrics in quality control systems and comparing objective algorithms. Also, using vulnerable metrics as a loss for deep learning model training can mislead training to worsen visual quality. Because of that, quality metrics testing for vulnerability is a task of current interest. This paper proposes a new method for testing quality metrics vulnerability in the physical space. To our knowledge, quality metrics were not previously tested for vulnerability to this attack; they were only tested in the pixel space. We applied a physical adversarial Ti-Patch (Tiled Patch) attack to quality metrics and did experiments both in pixel and physical space. We also performed experiments on the implementation of physical adversarial wallpaper. The proposed method can be used as additional quality metrics in vulnerability evaluation, complementing traditional subjective comparison and vulnerability tests in the pixel space. We made our code and adversarial videos available on GitHub: this https URL.

Abstract (translated)

目标无参考图像和视频质量指标在许多计算机视觉任务中至关重要。然而，最先进的无参考指标已成为基于学习的指标，并容易受到攻击者的攻击。质量指标的漏洞对在质量控制系统和比较目标算法中使用这些指标造成了限制。此外，将脆弱指标作为深度学习模型训练的损失可能会误导训练，使视觉质量恶化。因此，质量指标测试漏洞是一个当前的研究兴趣。本文提出了一种在物理空间中测试质量指标漏洞的新方法。据我们所知，以前没有在攻击者攻击下测试过质量指标的漏洞；他们只能在像素空间进行测试。我们对质量指标进行了物理攻击者Ti-Patch（Tiled Patch）攻击，并在像素和物理空间进行了实验。我们还研究了在物理空间中实现物理攻击者壁纸的实现。所提出的方法可以用作漏洞评估中的额外质量指标，补充像素空间中的传统主观比较和漏洞测试。我们将我们的代码和攻击视频公开在GitHub上：这是https://github.com/。

URL

https://arxiv.org/abs/2404.09961

PDF

https://arxiv.org/pdf/2404.09961.pdf
Read All
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

2024-04-15 17:31:32

Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski

arXiv_CV

arXiv_CV Segmentation Deep_Learning Few-Shot Medical Self-Supervised
Abstract

Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, the foundation model developed with image segmentation in mind - Segment Anything Model (SAM) - has been developed only recently and has shown similar promise. However, there are still no systematic analyses or ``best-practice'' guidelines for optimal fine-tuning of SAM for medical image segmentation. This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning algorithms across 18 combinations, and evaluates them on 17 datasets covering all common radiology modalities. Our study reveals that (1) fine-tuning SAM leads to slightly better performance than previous segmentation methods, (2) fine-tuning strategies that use parameter-efficient learning in both the encoder and decoder are superior to other strategies, (3) network architecture has a small impact on final performance, (4) further training SAM with self-supervised learning can improve final model performance. We also demonstrate the ineffectiveness of some methods popular in the literature and further expand our experiments into few-shot and prompt-based settings. Lastly, we released our code and MRI-specific fine-tuned weights, which consistently obtained superior performance over the original SAM, at this https URL.

Abstract (translated)

自动分割是医学图像分析的基本任务，在深度学习出现后取得了显著的进步。虽然基础模型在自然语言处理和某些视觉任务上已经有所帮助，但专门为图像分割开发的基础模型 - Segment Anything Model (SAM) - 仅最近才开发，并显示出与原SAM相似的潜力。然而，在医疗图像分割的优化细调方面，还没有系统的分析或“最佳实践”指南。本文总结了18种不同骨干架构、模型组件和细调算法的现有细调策略，并在包括所有常见放射学模态的17个数据集上对其进行了评估。我们的研究显示，（1）细调SAM slightly提高了性能，（2）使用参数高效的编码器和解码器策略的细调策略优于其他策略，（3）网络架构对最终性能的影响很小，（4）使用自监督学习进一步训练SAM可以提高最终模型性能。我们还证明了文献中流行的一些方法的无效性，并将实验扩展到基于少样本和提示的设置。最后，我们发布了我们的代码和专用的MRI细调权重，这些权重在原SAM上 consistently取得了卓越的性能，可以在该链接处访问：https://url.

URL

https://arxiv.org/abs/2404.09957

PDF

https://arxiv.org/pdf/2404.09957.pdf
Read All
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting

2024-04-15 17:24:57

Kim Hoang Tran, Phuc Vuong Do, Ngoc Quoc Ly, Ngan Le

arXiv_CV

arXiv_CV Deep_Learning Face Attention Language_Model Transformer Pose Action
Abstract

Sports videos pose complex challenges, including cluttered backgrounds, camera angle changes, small action-representing objects, and imbalanced action class distribution. Existing methods for detecting actions in sports videos heavily rely on global features, utilizing a backbone network as a black box that encompasses the entire spatial frame. However, these approaches tend to overlook the nuances of the scene and struggle with detecting actions that occupy a small portion of the frame. In particular, they face difficulties when dealing with action classes involving small objects, such as balls or yellow/red cards in soccer, which only occupy a fraction of the screen space. To address these challenges, we introduce a novel approach that analyzes and models scene entities using an adaptive attention mechanism. Particularly, our model disentangles the scene content into the global environment feature and local relevant scene entities feature. To efficiently extract environmental features while considering temporal information with less computational cost, we propose the use of a 2D backbone network with a time-shift mechanism. To accurately capture relevant scene entities, we employ a Vision-Language model in conjunction with the adaptive attention mechanism. Our model has demonstrated outstanding performance, securing the 1st place in the SoccerNet-v2 Action Spotting, FineDiving, and FineGym challenge with a substantial performance improvement of 1.6, 2.0, and 1.3 points in avg-mAP compared to the runner-up methods. Furthermore, our approach offers interpretability capabilities in contrast to other deep learning models, which are often designed as black boxes. Our code and models are released at: this https URL.

Abstract (translated)

体育视频检测动作存在复杂的挑战，包括杂乱的背景、摄像机角度变化、表示小动作的物体以及不平衡的动作类别的分布。现有的方法在检测体育视频中的动作时，主要依赖全局特征，利用一个包含整个空间帧的骨干网络作为黑盒。然而，这些方法往往忽视了场景的细节，并且在检测占据帧小部分的动作时遇到困难。尤其是，当涉及到涉及小物体的动作类时，如足球中的球或黄/红牌，这些物体只占屏幕空间的一小部分时，它们往往无法准确检测到动作。为了应对这些挑战，我们提出了一种新方法，使用自适应注意机制分析并建模场景实体。特别是，我们的模型将场景内容分解为全局环境特征和局部相关场景实体特征。为了在考虑时间信息的同时以较低的计算成本提取环境特征，我们提出了使用具有时间平移机制的二维骨干网络。为了准确捕捉相关场景实体，我们使用了一个视觉语言模型与自适应注意机制相结合。我们的模型在SoccerNet-v2 Action Spotting、FineDiving和FineGym挑战中取得了非常出色的成绩，平均MAP得分比 runner-up 方法提高了1.6、2.0和1.3分。此外，与其他深度学习模型相比，我们的方法具有可解释性优势，这些模型通常被设计为黑盒。我们的代码和模型发布在：https://这个 URL。

URL

https://arxiv.org/abs/2404.09951

PDF

https://arxiv.org/pdf/2404.09951.pdf
Read All
eMotion-GAN: A Motion-based GAN for Photorealistic and Facial Expression Preserving Frontal View Synthesis

2024-04-15 17:08:53

Omar Ikne, Benjamin Allaert, Ioan Marius Bilasco, Hazem Wannous

arXiv_CV

arXiv_CV GAN Recognition Deep_Learning Face Pose Emotion
Abstract

Many existing facial expression recognition (FER) systems encounter substantial performance degradation when faced with variations in head pose. Numerous frontalization methods have been proposed to enhance these systems' performance under such conditions. However, they often introduce undesirable deformations, rendering them less suitable for precise facial expression analysis. In this paper, we present eMotion-GAN, a novel deep learning approach designed for frontal view synthesis while preserving facial expressions within the motion domain. Considering the motion induced by head variation as noise and the motion induced by facial expression as the relevant information, our model is trained to filter out the noisy motion in order to retain only the motion related to facial expression. The filtered motion is then mapped onto a neutral frontal face to generate the corresponding expressive frontal face. We conducted extensive evaluations using several widely recognized dynamic FER datasets, which encompass sequences exhibiting various degrees of head pose variations in both intensity and orientation. Our results demonstrate the effectiveness of our approach in significantly reducing the FER performance gap between frontal and non-frontal faces. Specifically, we achieved a FER improvement of up to +5\% for small pose variations and up to +20\% improvement for larger pose variations. Code available at \url{this https URL}.

Abstract (translated)

许多现有的面部表情识别（FER）系统在面临姿态变化时性能显著下降。为了提高这些系统的性能，已经提出了许多前馈方法。然而，它们通常引入了不希望的畸形，使得它们在精确面部表情分析方面变得不合适。在本文中，我们提出了eMotion-GAN，一种专门针对前视图合成，同时在运动领域保留面部表情的深度学习方法。将头部的变化引起的运动视为噪声，将面部表情引起的运动视为相关信息，我们的模型被训练为滤除噪声运动，仅保留与面部表情相关的运动。对滤除后的运动进行映射，生成相应的表情前视图。我们对几个广泛认可的动态FER数据集进行了广泛的评估，这些数据集涵盖强度和方向上各种程度的头部姿态变化。我们的结果表明，我们的方法在显著降低前视图和非前视图之间的FER性能差距方面取得了显著效果。具体来说，我们实现了小姿态变化下的FER改进率最高达+5%，大姿态变化下的FER改进率最高达+20%。代码可在此处访问：https://this URL。

URL

https://arxiv.org/abs/2404.09940

PDF

https://arxiv.org/pdf/2404.09940.pdf
Read All
A Survey on Deep Learning for Theorem Proving

2024-04-15 17:07:55

Zhaoyu Li, Jialiang Sun, Logan Murphy, Qidong Su, Zenan Li, Xian Zhang, Kaiyu Yang, Xujie Si

arXiv_AI

arXiv_AI Deep_Learning Review Survey Language_Model
Abstract

Theorem proving is a fundamental aspect of mathematics, spanning from informal reasoning in mathematical language to rigorous derivations in formal systems. In recent years, the advancement of deep learning, especially the emergence of large language models, has sparked a notable surge of research exploring these techniques to enhance the process of theorem proving. This paper presents a pioneering comprehensive survey of deep learning for theorem proving by offering i) a thorough review of existing approaches across various tasks such as autoformalization, premise selection, proofstep generation, and proof search; ii) a meticulous summary of available datasets and strategies for data generation; iii) a detailed analysis of evaluation metrics and the performance of state-of-the-art; and iv) a critical discussion on the persistent challenges and the promising avenues for future exploration. Our survey aims to serve as a foundational reference for deep learning approaches in theorem proving, seeking to catalyze further research endeavors in this rapidly growing field.

Abstract (translated)

证明是数学的一个基本方面，从数学语言中的非正式推理到形式化系统中的严谨推导。在最近几年，随着深度学习的进步，特别是大型语言模型的出现，研究深度学习技术以提高定理证明的过程引起了显著的关注。本文对深度学习在定理证明方面的全面调查，通过提供i)对各种任务（自适应形式化、前提选择、证明步骤生成和证明搜索）的深入回顾；ii)现有数据集和策略的详细总结；iii)对评估指标的详细分析以及最先进的证明方法的性能；以及iv)关于持续挑战和未来探索的有价值的讨论。我们的调查旨在成为深度学习方法在定理证明领域的一个基础性参考，以催化该领域进一步的研究探索。

URL

https://arxiv.org/abs/2404.09939

PDF

https://arxiv.org/pdf/2404.09939.pdf
Read All
Zero-shot Building Age Classification from Facade Image Using GPT-4

2024-04-15 16:47:22

Zichao Zeng, June Moh Goo, Xinglei Wang, Bin Chi, Meihui Wang, Jan Boehm

arXiv_AI

arXiv_AI Deep_Learning Classification Prediction Language_Model Transformer Zero-Shot Chat
Abstract

A building's age of construction is crucial for supporting many geospatial applications. Much current research focuses on estimating building age from facade images using deep learning. However, building an accurate deep learning model requires a considerable amount of labelled training data, and the trained models often have geographical constraints. Recently, large pre-trained vision language models (VLMs) such as GPT-4 Vision, which demonstrate significant generalisation capabilities, have emerged as potential training-free tools for dealing with specific vision tasks, but their applicability and reliability for building information remain unexplored. In this study, a zero-shot building age classifier for facade images is developed using prompts that include logical instructions. Taking London as a test case, we introduce a new dataset, FI-London, comprising facade images and building age epochs. Although the training-free classifier achieved a modest accuracy of 39.69%, the mean absolute error of 0.85 decades indicates that the model can predict building age epochs successfully albeit with a small bias. The ensuing discussion reveals that the classifier struggles to predict the age of very old buildings and is challenged by fine-grained predictions within 2 decades. Overall, the classifier utilising GPT-4 Vision is capable of predicting the rough age epoch of a building from a single facade image without any training.

Abstract (translated)

建筑物的建造年龄对于支持许多地理空间应用至关重要。目前，大量研究关注使用深度学习从外观图像估计建筑物的年龄。然而，要建立一个准确的深度学习模型需要大量的带标签训练数据，训练出的模型通常具有地理约束。最近，一些预训练的大规模视觉语言模型（VLMs）如GPT-4 Vision，表现出显著的泛化能力，成为处理特定视觉任务的潜在无标签工具，但它们用于构建信息的可行性和可靠性仍需探索。在本文中，使用提示包括逻辑指令开发了一个零散的建筑年龄分类器来处理外观图像。以伦敦为例，我们引入了一个新的数据集FI-London，包括外观图像和建筑年龄的epoch。虽然无标签分类器取得了39.69%的 modest 准确率，但0.85世纪的平均绝对误差表明，模型可以成功预测建筑物的年龄，尽管存在一定的偏差。接下来的讨论揭示了，分类器在预测非常古老的建筑物时遇到困难，并且在20年内的细粒度预测方面受到挑战。总体而言，使用GPT-4 Vision的分类器可以从单个外观图像预测建筑物的粗略年龄。

URL

https://arxiv.org/abs/2404.09921

PDF

https://arxiv.org/pdf/2404.09921.pdf
Read All
Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models

2024-04-15 14:26:00

Hyeonggeun Yun

arXiv_AI

arXiv_AI Deep_Learning Classification Image_Classification Prediction Action
Abstract

In computer vision, explainable AI (xAI) methods seek to mitigate the 'black-box' problem by making the decision-making process of deep learning models more interpretable and transparent. Traditional xAI methods concentrate on visualizing input features that influence model predictions, providing insights primarily suited for experts. In this work, we present an interaction-based xAI method that enhances user comprehension of image classification models through their interaction. Thus, we developed a web-based prototype allowing users to modify images via painting and erasing, thereby observing changes in classification results. Our approach enables users to discern critical features influencing the model's decision-making process, aligning their mental models with the model's logic. Experiments conducted with five images demonstrate the potential of the method to reveal feature importance through user interaction. Our work contributes a novel perspective to xAI by centering on end-user engagement and understanding, paving the way for more intuitive and accessible explainability in AI systems.

Abstract (translated)

在计算机视觉领域，可解释性AI（xAI）方法旨在通过使深度学习模型的决策过程更加可解释和透明来解决“黑盒子”问题。传统的xAI方法集中精力可视化影响模型预测的输入特征，为专家提供最适合的见解。在这项工作中，我们提出了一个基于交互的xAI方法，通过用户交互来增强用户对图像分类模型的理解。因此，我们开发了一个基于网页的原型，使用户可以通过涂画和擦除来修改图像，从而观察分类结果的变化。我们的方法使用户能够分辨影响模型决策过程的关键特征，将他们的思维模型与模型的逻辑对齐。用五张图像进行的实验证明了这种方法通过用户交互揭示特征的重要性。我们的工作为xAI领域提供了一个新的视角，将重点放在了最终用户的参与和理解上，为AI系统提供了更直观和易用的可解释性。

URL

https://arxiv.org/abs/2404.09828

PDF

https://arxiv.org/pdf/2404.09828.pdf
Read All
Contrastive Pretraining for Visual Concept Explanations of Socioeconomic Outcomes

2024-04-15 13:13:56

Ivica Obadic, Alex Levering, Lars Pennig, Dario Oliveira, Diego Marcos, Xiaoxiang Zhu

arXiv_CV

arXiv_CV Deep_Learning Embedding Represenation_Learning Pose
Abstract

Predicting socioeconomic indicators from satellite imagery with deep learning has become an increasingly popular research direction. Post-hoc concept-based explanations can be an important step towards broader adoption of these models in policy-making as they enable the interpretation of socioeconomic outcomes based on visual concepts that are intuitive to humans. In this paper, we study the interplay between representation learning using an additional task-specific contrastive loss and post-hoc concept explainability for socioeconomic studies. Our results on two different geographical locations and tasks indicate that the task-specific pretraining imposes a continuous ordering of the latent space embeddings according to the socioeconomic outcomes. This improves the model's interpretability as it enables the latent space of the model to associate urban concepts with continuous intervals of socioeconomic outcomes. Further, we illustrate how analyzing the model's conceptual sensitivity for the intervals of socioeconomic outcomes can shed light on new insights for urban studies.

Abstract (translated)

使用深度学习预测卫星影像中的社会学指标已经成为越来越受欢迎的研究方向。基于概念的后验解释可以成为在政策制定中更广泛采用这些模型的一个重要步骤，因为它们使得根据人类直觉直观的视觉概念来解释社会学结果。在本文中，我们研究了使用任务特定对比损失的表示学习和社会学研究中的后验概念解释之间的相互作用。我们在两个不同的地理位置和任务上的结果表明，任务特定的预训练强制对潜在空间嵌入进行连续排序，根据社会学结果。这使得模型的可解释性得到提高，因为它使得模型的潜在空间能够将城市概念与连续的社会经济结果间隔相联系。此外，我们说明了分析模型对社会学结果间隔的敏感性如何能够为城市研究带来新的洞见。

URL

https://arxiv.org/abs/2404.09768

PDF

https://arxiv.org/pdf/2404.09768.pdf
Read All
Deep Learning-Based Segmentation of Tumors in PET/CT Volumes: Benchmark of Different Architectures and Training Strategies

2024-04-15 13:03:42

Monika Górka, Daniel Jaworek, Marek Wodzinski

arXiv_CV

arXiv_CV Segmentation Deep_Learning
Abstract

Cancer is one of the leading causes of death globally, and early diagnosis is crucial for patient survival. Deep learning algorithms have great potential for automatic cancer analysis. Artificial intelligence has achieved high performance in recognizing and segmenting single lesions. However, diagnosing multiple lesions remains a challenge. This study examines and compares various neural network architectures and training strategies for automatically segmentation of cancer lesions using PET/CT images from the head, neck, and whole body. The authors analyzed datasets from the AutoPET and HECKTOR challenges, exploring popular single-step segmentation architectures and presenting a two-step approach. The results indicate that the V-Net and nnU-Net models were the most effective for their respective datasets. The results for the HECKTOR dataset ranged from 0.75 to 0.76 for the aggregated Dice coefficient. Eliminating cancer-free cases from the AutoPET dataset was found to improve the performance of most models. In the case of AutoPET data, the average segmentation efficiency after training only on images containing cancer lesions increased from 0.55 to 0.66 for the classic Dice coefficient and from 0.65 to 0.73 for the aggregated Dice coefficient. The research demonstrates the potential of artificial intelligence in precise oncological diagnostics and may contribute to the development of more targeted and effective cancer assessment techniques.

Abstract (translated)

癌症是全球死亡领先的原因之一，早期的诊断对患者生存至关重要。深度学习算法在自动癌症分析方面具有巨大的潜力。人工智能已经在识别和分割单个病变方面取得了高度性能。然而，诊断多个癌症病变仍然具有挑战性。本文研究并比较了各种神经网络架构和训练策略，用于使用PET/CT图像对头、颈和全身自动分割癌症病变。作者分析了AutoPET和HECKTOR挑战的数据集，探索了流行的单步分割架构，并提出了两步方法。结果显示，V-Net和nnU-Net模型对于其各自的数据集是最有效的。HECKTOR数据集中的结果范围为0.75至0.76的累积Dice系数。消除AutoPET数据集中的癌症无病案例可以提高大多数模型的性能。在AutoPET数据中，仅包含癌症病变的图像上的平均分割效率从0.55增加至0.66，从0.65增加至0.73。这项研究展示了人工智能在精确癌症诊断中的潜力，并可能有助于开发更具有针对性和有效的癌症评估技术的发展。

URL

https://arxiv.org/abs/2404.09761

PDF

https://arxiv.org/pdf/2404.09761.pdf
Read All
AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides

2024-04-15 12:40:12

Kewei Li, Yuqian Wu, Yutong Guo, Yinheng Li, Yusi Fan, Ruochi Zhang, Lan Huang, Fengfeng Zhou

arXiv_AI

arXiv_AI Deep_Learning Relation Knowledge Prediction Language_Model Quantitative Pose Activity
Abstract

Activity cliff (AC) is a phenomenon that a pair of similar molecules differ by a small structural alternation but exhibit a large difference in their biochemical activities. The AC of small molecules has been extensively investigated but limited knowledge is accumulated about the AC phenomenon in peptides with canonical amino acids. This study introduces a quantitative definition and benchmarking framework AMPCliff for the AC phenomenon in antimicrobial peptides (AMPs) composed by canonical amino acids. A comprehensive analysis of the existing AMP dataset reveals a significant prevalence of AC within AMPs. AMPCliff quantifies the activities of AMPs by the metric minimum inhibitory concentration (MIC), and defines 0.9 as the minimum threshold for the normalized BLOSUM62 similarity score between a pair of aligned peptides with at least two-fold MIC changes. This study establishes a benchmark dataset of paired AMPs in Staphylococcus aureus from the publicly available AMP dataset GRAMPA, and conducts a rigorous procedure to evaluate various AMP AC prediction models, including nine machine learning, four deep learning algorithms, four masked language models, and four generative language models. Our analysis reveals that these models are capable of detecting AMP AC events and the pre-trained protein language ESM2 model demonstrates superior performance across the evaluations. The predictive performance of AMP activity cliffs remains to be further improved, considering that ESM2 with 33 layers only achieves the Spearman correlation coefficient=0.50 for the regression task of the MIC values on the benchmark dataset. Source code and additional resources are available at this https URL or this https URL.

Abstract (translated)

活动悬崖（AC）是一种现象，即一对相似的分子在结构上略有不同，但在生物化学活性上表现出很大的差异。对小分子AC的研究已经很广泛，但对于由同源氨基酸组成的肽中的AC现象，积累的知识还很少。本研究引入了针对由同源氨基酸组成的抗菌肽（AMPs）的活动悬崖（AC）的定量定义和基准框架AMPcliff，以研究由同源氨基酸组成的肽中的AC现象。对现有AMP数据集的全面分析揭示了AMP中AC在其中的显著分布。AMPcliff通过代谢最低抑制浓度（MIC）来量化AMPs的活性，并将0.9定义为对于具有至少两倍MIC变化的对齐的肽之间的标准化BLOSUM62相似性分数的最小阈值。本研究从公开可用的AMP数据集GRAMPA中建立了由同源氨基酸组成的AMP对基准数据集的基准数据集，并进行了对各种AMP AC预测模型的评估，包括九个机器学习模型、四个深度学习算法、四个遮蔽语言模型和四个生成语言模型。我们的分析显示，这些模型能够检测到AMP AC事件，而预训练的蛋白质语言ESM2模型在评估中表现出优越性能。考虑到ESM2模型仅在基准数据集中的MIC回归任务上实现了Spearman相关系数=0.50，AMP活性悬崖的预测性能仍有待进一步改进。源代码和附加资源可在此https://url或https://url查阅。

URL

https://arxiv.org/abs/2404.09738

PDF

https://arxiv.org/pdf/2404.09738.pdf
Read All
Equipping Diffusion Models with Differentiable Spatial Entropy for Low-Light Image Enhancement

2024-04-15 12:35:10

Wenyi Lian, Wenjing Lian, Ziwei Luo

arXiv_CV

arXiv_CV Image_Enhancement Deep_Learning Face Prediction Pose Restoration Enhancement Matching Diffusion
Abstract

Image restoration, which aims to recover high-quality images from their corrupted counterparts, often faces the challenge of being an ill-posed problem that allows multiple solutions for a single input. However, most deep learning based works simply employ l1 loss to train their network in a deterministic way, resulting in over-smoothed predictions with inferior perceptual quality. In this work, we propose a novel method that shifts the focus from a deterministic pixel-by-pixel comparison to a statistical perspective, emphasizing the learning of distributions rather than individual pixel values. The core idea is to introduce spatial entropy into the loss function to measure the distribution difference between predictions and targets. To make this spatial entropy differentiable, we employ kernel density estimation (KDE) to approximate the probabilities for specific intensity values of each pixel with their neighbor areas. Specifically, we equip the entropy with diffusion models and aim for superior accuracy and enhanced perceptual quality over l1 based noise matching loss. In the experiments, we evaluate the proposed method for low light enhancement on two datasets and the NTIRE challenge 2024. All these results illustrate the effectiveness of our statistic-based entropy loss. Code is available at this https URL.

Abstract (translated)

图像修复的目标是从损坏的图像中恢复高质量的图像，通常面临着一个具有单个输入多项式解的问题。然而，大多数基于深度学习的作品仅仅采用L1损失来以确定性的方式训练网络，导致预测过拟合，感知质量差。在本文中，我们提出了一种新方法，将重点从确定性的像素逐像素比较转变为统计视角，强调学习分布而不是单个像素值。核心思想是引入空间熵到损失函数中，以测量预测和目标之间的分布差异。为了使空间熵不同寻常，我们采用核密度估计（KDE）来近似每个像素具有与其邻居区域的具体强度值的概率。具体来说，我们将熵与扩散模型相结合，旨在实现与基于L1噪声匹配的损失相比的卓越准确性和感知质量的提高。在实验中，我们对所提出的方法在两个数据集上的低光增强进行了评估，以及NTIRE挑战2024。所有这些结果都说明了基于统计熵的熵损失的有效性。代码可在此处访问：https://www.xxx.com/

URL

https://arxiv.org/abs/2404.09735

PDF

https://arxiv.org/pdf/2404.09735.pdf
Read All
Post-Training Network Compression for 3D Medical Image Segmentation: Reducing Computational Efforts via Tucker Decomposition

2024-04-15 11:36:31

Tobias Weber, Jakob Dexl, David Rügamer, Michael Ingrisch

arXiv_CV

arXiv_CV Segmentation CNN Deep_Learning Relation Inference Pose Medical 3D
Abstract

We address the computational barrier of deploying advanced deep learning segmentation models in clinical settings by studying the efficacy of network compression through tensor decomposition. We propose a post-training Tucker factorization that enables the decomposition of pre-existing models to reduce computational requirements without impeding segmentation accuracy. We applied Tucker decomposition to the convolutional kernels of the TotalSegmentator (TS) model, an nnU-Net model trained on a comprehensive dataset for automatic segmentation of 117 anatomical structures. Our approach reduced the floating-point operations (FLOPs) and memory required during inference, offering an adjustable trade-off between computational efficiency and segmentation quality. This study utilized the publicly available TS dataset, employing various downsampling factors to explore the relationship between model size, inference speed, and segmentation performance. The application of Tucker decomposition to the TS model substantially reduced the model parameters and FLOPs across various compression rates, with limited loss in segmentation accuracy. We removed up to 88% of the model's parameters with no significant performance changes in the majority of classes after fine-tuning. Practical benefits varied across different graphics processing unit (GPU) architectures, with more distinct speed-ups on less powerful hardware. Post-hoc network compression via Tucker decomposition presents a viable strategy for reducing the computational demand of medical image segmentation models without substantially sacrificing accuracy. This approach enables the broader adoption of advanced deep learning technologies in clinical practice, offering a way to navigate the constraints of hardware capabilities.

Abstract (translated)

我们通过研究张量分解来解决在临床环境中部署高级深度学习分割模型的计算障碍。我们提出了一种后训练的Tucker因子分解，使得分解预先存在的模型以降低计算需求，同时不损害分割精度。我们将Tucker分解应用于总分割器（TS）模型的卷积内核，这是一种基于全面数据集的nnU-Net模型，用于自动分割117个解剖结构。我们的方法在推理过程中降低了浮点运算（FLOPs）和内存需求，实现了计算效率与分割质量之间的可调整权衡。这项研究使用了公开可用的TS数据集，并使用各种下采样因子来探讨模型大小、推理速度与分割性能之间的关系。将Tucker分解应用于TS模型大大减少了在不同压缩率下的模型参数和FLOPs，但分割精度有限。在微调后，模型的参数减少了88%，而大多数类别的性能没有显著变化。不同图形处理单元（GPU）架构的实际效果有所不同，在更弱的硬件上实现更明显的加速。通过Tucker分解进行后压缩网络是一个减少医疗图像分割模型计算需求的可行策略，而不会牺牲精度。这种方法使先进的深度学习技术在临床实践中得到更广泛的采用，提供了一种应对硬件约束的方法。

URL

https://arxiv.org/abs/2404.09683

PDF

https://arxiv.org/pdf/2404.09683.pdf
Read All
CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

2024-04-15 10:19:39

Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

arXiv_CV

arXiv_CV Recognition Deep_Learning Relation Inference Knowledge Pose Zero-Shot
Abstract

Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances and attribute co-occurrence among instances often hinder the discernment of local variances in images, a problem exacerbated by the scarcity of fine-grained, region-specific attribute annotations. Moreover, the variability in visual presentation within categories can also skew attribute-category associations. In response, we propose a bidirectional cross-modal ZSL approach CREST. It begins by extracting representations for attribute and visual localization and employs Evidential Deep Learning (EDL) to measure underlying epistemic uncertainty, thereby enhancing the model's resilience against hard negatives. CREST incorporates dual learning pathways, focusing on both visual-category and attribute-category alignments, to ensure robust correlation between latent and observable spaces. Moreover, we introduce an uncertainty-informed cross-modal fusion technique to refine visual-attribute inference. Extensive experiments demonstrate our model's effectiveness and unique explainability across multiple datasets. Our code and data are available at: Comments: Ongoing work; 10 pages, 2 Tables, 9 Figures; Repo is available at this https URL.

Abstract (translated)

零样本学习（ZSL）通过利用已知类别的语义知识从已知到未知类别的迁移来识别新颖类别。这种知识通常被封装在属性描述中，有助于识别类特定的视觉特征，从而促进视觉-语义对齐并提高ZSL性能。然而，现实世界的挑战，如分布不平衡和实例之间的属性共现，往往阻碍了图像中局部变差的鉴别，而这也使得对于细粒度、区域特定的属性注释的稀缺性更加严重。此外，同一类别内视觉表现的可变性也会导致属性-类别关联的偏斜。为了应对这些挑战，我们提出了双向跨模态ZSL方法CREST。它首先提取属性和视觉局部化的表示，并采用证据深度学习（EDL）测量潜在的认知不确定性，从而增强模型对 hard negative 的抵抗力。CREST包括两条学习路径：关注视觉类别和属性类别的对齐，以确保潜在可观察空间与观测可观察空间之间的稳健相关性。此外，我们还引入了一种不确定信息引导的跨模态融合技术，以优化视觉-属性推理。大量实验证明了我们模型的有效性以及其独特的可解释性，在我们的数据集上。我们的代码和数据可在此处获取：https://github.com/your-username/zero-shot-learning-zsl

URL

https://arxiv.org/abs/2404.09640

PDF

https://arxiv.org/pdf/2404.09640.pdf
Read All
MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

2024-04-15 08:32:41

Yuxuan Jiang, Chen Feng, Fan Zhang, David Bull

arXiv_CV

arXiv_CV Deep_Learning Super_Resolution Knowledge Pose
Abstract

Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training strategies with a single teacher and simple loss functions. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) framework specifically for image super-resolution. It exploits the advantages of multiple teachers by combining and enhancing the outputs of these teacher models, which then guides the learning process of the compact student network. To achieve more effective learning performance, we have also developed a new wavelet-based loss function for MTKD, which can better optimize the training process by observing differences in both the spatial and frequency domains. We fully evaluate the effectiveness of the proposed method by comparing it to five commonly used KD methods for image super-resolution based on three popular network architectures. The results show that the proposed MTKD method achieves evident improvements in super-resolution performance, up to 0.46dB (based on PSNR), over state-of-the-art KD approaches across different network structures. The source code of MTKD will be made available here for public evaluation.

Abstract (translated)

知识蒸馏（KD）在深度学习中已经成为一个有前景的技术，通常用于通过学习从其高性能但更复杂的教师变体来增强紧凑的学生网络。当应用于图像超分辨率时，大多数KD方法都是为其他计算机视觉任务开发的方法，基于单教师训练策略和简单的损失函数。在本文中，我们提出了一个名为多教师知识蒸馏（MTKD）的新框架，专门用于图像超分辨率。它利用了多个教师的优势，通过结合和增强这些教师模型的输出，从而引导紧凑学生网络的学习过程。为了获得更好的学习效果，我们还开发了一个基于小波的损失函数，该函数可以在空间和频率域中更好地优化训练过程。我们对五种最常用的KD方法进行了全面的评估，这些方法基于三种流行的网络架构。结果表明，与最先进的KD方法相比，MTKD方法在超分辨率性能上明显取得了改进，性能提高了0.46dB（基于PSNR）。MTKD的源代码将通过这里公开评估。

URL

https://arxiv.org/abs/2404.09571

PDF

https://arxiv.org/pdf/2404.09571.pdf
Read All
WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

2024-04-15 07:53:07

Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Hongjie Zheng

arXiv_AI

arXiv_AI CNN Deep_Learning Transformer Denoising Medical Enhancement
Abstract

Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer Networks with the Unet architecture. This architecture enhances image detail by integrating feature maps from the encoder and decoder via skip connections. However, current methods often overlook enhancements to the Unet architecture itself, focusing instead on optimizing encoder and decoder structures. This approach can be problematic due to the significant differences in feature map characteristics between the encoder and decoder, where simple fusion strategies may not effectively reconstruct this http URL this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathways instead of traditional skip connections to improve feature integration. WiTUnet also incorporates a windowed Transformer structure to process images in smaller, non-overlapping segments, reducing computational load. Additionally, the integration of a Local Image Perception Enhancement (LiPe) module in both the encoder and decoder replaces the standard multi-layer perceptron (MLP) in Transformers, enhancing local feature capture and representation. Through extensive experimental comparisons, WiTUnet has demonstrated superior performance over existing methods in key metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE), significantly improving noise removal and image quality.

Abstract (translated)

低剂量CT（LDCT）已成为诊断医学成像的首选技术，尽管其与标准CT相比辐射剂量较低，但图像噪声增加，可能会影响诊断准确性。为解决这个问题，已经开发了高级基于深度学习的LDCT去噪算法，主要使用卷积神经网络（CNN）或具有Unet架构的Transformer网络。这种架构通过级联密集跳过路径将编码器和解码器的特征图进行集成，从而增强图像细节。然而，目前的方法通常忽视了Unet架构本身的增长，而是专注于优化编码器和解码器结构。这种方法由于编码器和解码器之间特征图特征的显著差异而具有问题，简单的融合策略可能无法有效地重构本文中的这个URL。我们引入了WiTUnet，一种新颖的LDCT图像去噪方法，它利用嵌套的密集跳过路径而不是传统的跳过连接来提高特征集成。WiTUnet还引入了一个窗口化的Transformer结构来处理更小的、非重叠的图像段，降低计算负载。此外，在编码器和解码器中引入了Local Image Perception Enhancement（LiPe）模块，用换位图像感知增强代替了Transformer中的标准多层感知器，增强了局部特征捕捉和表示。通过广泛的实验比较，WiTUnet在关键指标如峰值信号-噪声比（PSNR）、结构相似性（SSIM）和根均方误差（RMSE）方面已经表现出优越的性能，显著提高了去噪效果和图像质量。

URL

https://arxiv.org/abs/2404.09533

PDF

https://arxiv.org/pdf/2404.09533.pdf
Read All
RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

2024-04-15 07:50:15

Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Rajeev Singh, Ritika Jha, Rajiv Ratn Shah, Shin'ichi Satoh

arXiv_AI

arXiv_AI Detection Deep_Learning Inference
Abstract

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Abstract (translated)

大型的真实数据集和深度学习技术的最近进展对于布局检测非常有用。然而，由于这些数据集中的有限布局多样性，在它们上进行训练需要大量的带标签实例，这既耗资又耗时。因此，源域和目标域之间的差异可能会显著影响这些模型的工作效果。为了解决这个问题，领域迁移方法已经被开发出来，这些方法使用少量的带标签数据来调整模型以适应目标域。在这项研究中，我们引入了一个名为RanLayNet的合成文档数据集，它通过自动分配标签来表示布局元素的地理位置、范围和类型。这项努力的主要目标是开发一个具有鲁棒性和适应性来处理各种文档格式的多样化的数据集。通过实验验证，我们在该数据集上训练的深度布局识别模型在仅基于实际文档的训练模型的性能上表现出了增强。此外，我们对Doclaynet数据集上使用PubLayNet和IIIT-AR-13K数据集进行微调的推理模型进行了比较分析。我们的研究结果强调，我们数据集中的模型对于在科学文献领域实现0.398和0.588 mAP95得分非常重要。

URL

https://arxiv.org/abs/2404.09530

PDF

https://arxiv.org/pdf/2404.09530.pdf
Read All
State Space Model for New-Generation Network Alternative to Transformers: A Survey

2024-04-15 07:24:45

Xiao Wang, Shiao Wang, Yuhe Ding, Yuehang Li, Wentao Wu, Yao Rong, Weizhe Kong, Ju Huang, Shihao Li, Haoxiang Yang, Ziwen Wang, Bo Jiang, Chenglong Li, Yaowei Wang, Yonghong Tian, Jin Tang

arXiv_AI

arXiv_AI Deep_Learning Review Survey Attention Transformer Pose Point_Cloud
Abstract

In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks. However, the enormous computational demands of this architecture have deterred many researchers. To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods. Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years. In this paper, we give the first comprehensive review of these works and also provide experimental comparisons and analysis to better demonstrate the features and advantages of SSM. Specifically, we first give a detailed description of principles to help the readers quickly capture the key ideas of SSM. After that, we dive into the reviews of existing SSMs and their various applications, including natural language processing, computer vision, graph, multi-modal and multi-media, point cloud/event stream, time series data, and other domains. In addition, we give statistical comparisons and analysis of these models and hope it helps the readers to understand the effectiveness of different structures on various tasks. Then, we propose possible research points in this direction to better promote the development of the theoretical model and application of SSM. More related works will be continuously updated on the following GitHub: this https URL.

Abstract (translated)

在深度学习后的时代，Transformer架构已经在预训练的大模型和各种下游任务中展示了其强大的性能。然而，这种架构巨大的计算需求已经让许多研究人员望而却步。为了进一步降低注意力的模型的复杂性，已经进行了许多努力来设计更有效的技术。其中，状态空间模型（SSM）作为一种可能的自注意力基于Transformer模型的替代方案，近年来越来越受到关注。在本文中，我们对这些工作进行了全面的回顾，并提供了实验比较和分析，以更好地展示SSM的特点和优势。具体来说，我们首先详细描述了帮助读者快速理解SSM原则的原理。然后，我们深入研究了现有SSM的评论及其各种应用，包括自然语言处理、计算机视觉、图、多模态和多媒体、点云/事件流、时间序列数据和其他领域。此外，我们还提供了这些模型的统计比较和分析，希望这有助于读者理解不同结构在各种任务上的效果。最后，我们提出了几个可能的研究方向，以更好地促进理论模型的开发和SSM的应用。更多相关的工作将在接下来的GitHub上持续更新：https://github.com/。

URL

https://arxiv.org/abs/2404.09516

PDF

https://arxiv.org/pdf/2404.09516.pdf
Read All
Experimental Analysis of Deep Hedging Using Artificial Market Simulations for Underlying Asset Simulators

2024-04-15 05:11:07

Masanori Hirano

arXiv_AI

arXiv_AI Deep_Learning Pose
Abstract

Derivative hedging and pricing are important and continuously studied topics in financial markets. Recently, deep hedging has been proposed as a promising approach that uses deep learning to approximate the optimal hedging strategy and can handle incomplete markets. However, deep hedging usually requires underlying asset simulations, and it is challenging to select the best model for such simulations. This study proposes a new approach using artificial market simulations for underlying asset simulations in deep hedging. Artificial market simulations can replicate the stylized facts of financial markets, and they seem to be a promising approach for deep hedging. We investigate the effectiveness of the proposed approach by comparing its results with those of the traditional approach, which uses mathematical finance models such as Brownian motion and Heston models for underlying asset simulations. The results show that the proposed approach can achieve almost the same level of performance as the traditional approach without mathematical finance models. Finally, we also reveal that the proposed approach has some limitations in terms of performance under certain conditions.

Abstract (translated)

衍生品对冲和定价是金融市场中重要的持续研究主题。最近，深度对冲作为一种有前景的方法，利用深度学习来近似最优对冲策略，并可以处理不完整市场，得到了广泛关注。然而，深度对冲通常需要基础资产的模拟，而且很难为这些模拟选择最佳模型。本研究提出了一种使用人工市场模拟作为基础资产模拟的新方法。人工市场模拟可以复制金融市场的抽象事实，似乎为深度对冲提供了一种有前途的方法。我们通过比较所提出的方法的结果与使用类似布朗运动和海顿模型的数学金融模型进行传统方法的结果来研究这种方法的有效性。结果表明，与传统方法相比，所提出的方法几乎可以实现相同的表现水平。最后，我们还发现，在某些条件下，所提出的方法在表现方面存在一些局限性。

URL

https://arxiv.org/abs/2404.09462

PDF

https://arxiv.org/pdf/2404.09462.pdf
Read All
Privacy at a Price: Exploring its Dual Impact on AI Fairness

2024-04-15 00:23:41

Mengmeng Yang, Ming Ding, Youyang Qu, Wei Ni, David Smith, Thierry Rakotoarivelo

arXiv_AI

arXiv_AI Gradient_Descent Deep_Learning Prediction
Abstract

The worldwide adoption of machine learning (ML) and deep learning models, particularly in critical sectors, such as healthcare and finance, presents substantial challenges in maintaining individual privacy and fairness. These two elements are vital to a trustworthy environment for learning systems. While numerous studies have concentrated on protecting individual privacy through differential privacy (DP) mechanisms, emerging research indicates that differential privacy in machine learning models can unequally impact separate demographic subgroups regarding prediction accuracy. This leads to a fairness concern, and manifests as biased performance. Although the prevailing view is that enhancing privacy intensifies fairness disparities, a smaller, yet significant, subset of research suggests the opposite view. In this article, with extensive evaluation results, we demonstrate that the impact of differential privacy on fairness is not monotonous. Instead, we observe that the accuracy disparity initially grows as more DP noise (enhanced privacy) is added to the ML process, but subsequently diminishes at higher privacy levels with even more noise. Moreover, implementing gradient clipping in the differentially private stochastic gradient descent ML method can mitigate the negative impact of DP noise on fairness. This mitigation is achieved by moderating the disparity growth through a lower clipping threshold.

Abstract (translated)

全球范围内机器学习（ML）和深度学习模型的采用，特别是在关键行业，如医疗保健和金融，对维护个人隐私和公平性提出了实质性的挑战。这两个要素对于一个可信赖的学习环境至关重要。虽然许多研究通过差异隐私（DP）机制保护个人隐私，但新兴研究表示，在机器学习模型中，差异隐私可能平等地影响预测准确性。这导致公平性问题，并表现为偏见表现。尽管普遍的看法是，提高隐私会加剧不公平差异，但较小、但具有重大意义的研究表明，这种看法是错误的。在本文中，我们通过大量评估结果，证明差异隐私对公平性的影响不是单调的。相反，我们观察到，在向ML过程添加更多DP噪声（增强隐私）时，准确度差异 initially增长，但随后在更高隐私水平上减少至更多噪声。此外，在差异隐私随机梯度下降ML方法中实现梯度截断可以减轻DP噪声对公平性的负面影响。这种减轻是通过降低截断阈值来调节差异增长实现的。

URL

https://arxiv.org/abs/2404.09391

PDF

https://arxiv.org/pdf/2404.09391.pdf
Read All
Breast Cancer Image Classification Method Based on Deep Transfer Learning

2024-04-14 12:09:47

Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li

arXiv_CV

arXiv_CV Detection Deep_Learning Classification Image_Classification Attention Transfer_Learning Pose Medical
Abstract

To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0\% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks.

Abstract (translated)

为解决有限样本量、耗时特征设计和乳腺癌图像检测和分类准确度低的问题，我们提出了一个结合深度学习和迁移学习思想的乳腺癌图像分类模型算法。该算法基于深度神经网络的DenseNet结构，通过引入注意机制构建了一个网络模型，并使用多级迁移学习对增强数据集进行训练。实验结果表明，该算法在测试集上的效率超过84.0%，与之前模型的分类准确度相比有显著提高，使得该算法适用于医疗乳腺癌检测任务。

URL

https://arxiv.org/abs/2404.09226

PDF

https://arxiv.org/pdf/2404.09226.pdf
Read All

Content

Deep_Learning (20)

Deep_Learning

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL