This study presents an ensemble-based approach for cocoa pod disease classification by integrating transfer learning with three ensemble learning strategies: Bagging, Boosting, and Stacking. Pre-trained convolutional neural networks, including VGG16, VGG19, ResNet50, ResNet101, InceptionV3, and Xception, were fine-tuned and employed as base learners to detect three disease categories: Black Pod Rot, Pod Borer, and Healthy. A balanced dataset of 6,000 cocoa pod images was curated and augmented to ensure robustness against variations in lighting, orientation, and disease severity. The performance of each ensemble method was evaluated using accuracy, precision, recall, and F1-score. Experimental results show that Bagging consistently achieved superior classification performance with a test accuracy of 100%, outperforming Boosting (97%) and Stacking (92%). The findings confirm that combining transfer learning with ensemble techniques improves model generalization and reliability, making it a promising direction for precision agriculture and automated crop disease management.
本研究提出了一种基于集成方法的可可果病害分类方案,该方案结合了迁移学习和三种集成学习策略:Bagging(装袋法)、Boosting(提升法)和Stacking(堆叠法)。通过对VGG16、VGG19、ResNet50、ResNet101、InceptionV3和Xception等预训练的卷积神经网络进行微调,将其用作基础学习器来检测三种病害类别:黑腐病、果荚虫害以及健康样本。为了确保模型在光照变化、方向不同及病情严重程度不一的情况下具有鲁棒性,研究团队精心整理并扩充了一个包含6,000张可可果图像的平衡数据集。通过准确率(accuracy)、精确率(precision)、召回率(recall)和F1分数对每种集成方法的性能进行了评估。实验结果表明,Bagging法始终取得了最佳分类效果,在测试中的准确率达到100%,优于Boosting法(97%)和Stacking法(92%)。研究发现证实了将迁移学习与集成技术相结合可以提高模型泛化能力和可靠性,这为精准农业及作物病害自动管理提供了有前景的方向。
https://arxiv.org/abs/2504.12992
Deep learning (DL)-based image classification models are essential for autonomous vehicle (AV) perception modules since incorrect categorization might have severe repercussions. Adversarial attacks are widely studied cyberattacks that can lead DL models to predict inaccurate output, such as incorrectly classified traffic signs by the perception module of an autonomous vehicle. In this study, we create and compare hybrid classical-quantum deep learning (HCQ-DL) models with classical deep learning (C-DL) models to demonstrate robustness against adversarial attacks for perception modules. Before feeding them into the quantum system, we used transfer learning models, alexnet and vgg-16, as feature extractors. We tested over 1000 quantum circuits in our HCQ-DL models for projected gradient descent (PGD), fast gradient sign attack (FGSA), and gradient attack (GA), which are three well-known untargeted adversarial approaches. We evaluated the performance of all models during adversarial attacks and no-attack scenarios. Our HCQ-DL models maintain accuracy above 95\% during a no-attack scenario and above 91\% for GA and FGSA attacks, which is higher than C-DL models. During the PGD attack, our alexnet-based HCQ-DL model maintained an accuracy of 85\% compared to C-DL models that achieved accuracies below 21\%. Our results highlight that the HCQ-DL models provide improved accuracy for traffic sign classification under adversarial settings compared to their classical counterparts.
基于深度学习(DL)的图像分类模型对于自动驾驶汽车(AV)感知模块至关重要,因为错误的分类可能会导致严重后果。对抗性攻击是广泛研究的网络攻击之一,可以导致DL模型预测出不准确的结果,例如自动驾驶车辆感知模块中交通标志被误分类的情况。在这项研究中,我们创建并比较了混合经典-量子深度学习(HCQ-DL)模型与传统深度学习(C-DL)模型,以展示其在对抗性攻击下为感知模块提供的鲁棒性。为了将它们输入到量子系统之前,我们使用迁移学习模型alexnet和vgg-16作为特征提取器。我们在我们的HCQ-DL模型中测试了超过1000个量子电路,针对投影梯度下降(PGD)、快速梯度符号攻击(FGSA)和梯度攻击(GA),这三种著名的非目标对抗性方法进行了测试。我们评估了所有模型在遭遇对抗性和无攻击场景下的性能表现。我们的HCQ-DL模型在无攻击场景下保持超过95%的准确率,在面对GA和FGSA攻击时准确率维持在91%以上,这一数值高于传统DL模型的表现。在PGD攻击期间,我们基于alexnet的HCQ-DL模型能够保持85%的准确性,而C-DL模型的准确率则低于21%。我们的研究结果表明,在对抗性设置下,HCQ-DL模型为交通标志分类提供的精度优于其传统DL模型对应者。
https://arxiv.org/abs/2504.12644
Spatial imbalances in crop type data pose significant challenges for accurate classification in remote sensing applications. Algorithms aiming at transferring knowledge from data-rich to data-scarce tasks have thus surged in popularity. However, despite their effectiveness in previous evaluations, their performance in challenging real-world applications is unclear and needs to be evaluated. This study benchmarks transfer learning and several meta-learning algorithms, including (First-Order) Model-Agnostic Meta-Learning ((FO)-MAML), Almost No Inner Loop (ANIL), and Task-Informed Meta-Learning (TIML), on the real-world EuroCropsML time series dataset, which combines farmer-reported crop data with Sentinel-2 satellite observations from Estonia, Latvia, and Portugal. Our findings indicate that MAML-based meta-learning algorithms achieve slightly higher accuracy compared to simpler transfer learning methods when applied to crop type classification tasks in Estonia after pre-training on data from Latvia. However, this improvement comes at the cost of increased computational demands and training time. Moreover, we find that the transfer of knowledge between geographically disparate regions, such as Estonia and Portugal, poses significant challenges to all investigated algorithms. These insights underscore the trade-offs between accuracy and computational resource requirements in selecting machine learning methods for real-world crop type classification tasks and highlight the difficulties of transferring knowledge between different regions of the Earth. To facilitate future research in this domain, we present the first comprehensive benchmark for evaluating transfer and meta-learning methods for crop type classification under real-world conditions. The corresponding code is publicly available at this https URL.
作物类型数据的空间不平衡给遥感应用中的准确分类带来了重大挑战。旨在将知识从数据丰富的任务转移到数据稀缺的任务的算法因此变得越来越受欢迎。然而,尽管它们在之前的评估中表现出有效性,但这些方法在具有挑战性的现实世界应用程序中的性能仍然不清楚,需要进行进一步评价。 本研究使用包含爱沙尼亚、拉脱维亚和葡萄牙农民报告的作物数据以及Sentinel-2卫星观测的EuroCropsML时间序列数据集,对迁移学习及几种元学习算法(包括模型无关的元学习((FO)-MAML)、几乎无内循环(ANIL) 和任务信息元学习(TIML))进行了基准测试。 我们的研究结果表明,在爱沙尼亚应用预训练于拉脱维亚数据上的作物类型分类任务时,基于MAML的元学习算法比简单的迁移学习方法表现出略高的准确性。然而,这种性能提升是以增加计算需求和训练时间为代价的。此外,我们发现地理上分离地区(如爱沙尼亚与葡萄牙)之间的知识转移对所有研究算法构成了重大挑战。 这些见解强调了在选择现实世界作物类型分类任务中的机器学习方法时,在准确性和计算资源要求之间存在的权衡,并突显了在全球不同区域间传输知识的困难。为了促进这一领域的未来研究,我们提供了第一个全面基准测试,用于评估在现实条件下进行作物类型分类的迁移和元学习方法的效果。相关的代码可在以下链接获取:[此URL]。
https://arxiv.org/abs/2504.11022
Link prediction on graphs has applications spanning from recommender systems to drug discovery. Temporal link prediction (TLP) refers to predicting future links in a temporally evolving graph and adds additional complexity related to the dynamic nature of graphs. State-of-the-art TLP models incorporate memory modules alongside graph neural networks to learn both the temporal mechanisms of incoming nodes and the evolving graph topology. However, memory modules only store information about nodes seen at train time, and hence such models cannot be directly transferred to entirely new graphs at test time and deployment. In this work, we study a new transfer learning task for temporal link prediction, and develop transfer-effective methods for memory-laden models. Specifically, motivated by work showing the informativeness of structural signals for the TLP task, we augment a structural mapping module to the existing TLP model architectures, which learns a mapping from graph structural (topological) features to memory embeddings. Our work paves the way for a memory-free foundation model for TLP.
图上的链路预测在推荐系统到药物发现等多个领域都有应用。时间链路预测(TLP)涉及预测随着时间演变的图形中的未来链接,增加了由于图形动态特性带来的额外复杂性。最先进的TLP模型结合了记忆模块与图神经网络,以学习入站节点的时间机制和演化的图拓扑结构。然而,这些记忆模块仅存储训练期间见过的节点的信息,因此这类模型无法直接应用于测试时或部署时完全新的图上。 在这项工作中,我们研究了一个新型的任务:用于时间链路预测的迁移学习任务,并为配备有内存的模型开发了有效的迁移方法。具体而言,鉴于已有研究表明结构信号对TLP任务具有信息性,我们在现有的TLP模型架构中添加了一个结构性映射模块,该模块从图结构(拓扑)特征学到记忆嵌入的映射关系。 我们的工作为不依赖于记忆的时间链路预测基础模型的发展铺平了道路。
https://arxiv.org/abs/2504.10925
Text-based Person Retrieval (TPR) as a multi-modal task, which aims to retrieve the target person from a pool of candidate images given a text description, has recently garnered considerable attention due to the progress of contrastive visual-language pre-trained model. Prior works leverage pre-trained CLIP to extract person visual and textual features and fully fine-tune the entire network, which have shown notable performance improvements compared to uni-modal pre-training models. However, full-tuning a large model is prone to overfitting and hinders the generalization ability. In this paper, we propose a novel Unified Parameter-Efficient Transfer Learning (PETL) method for Text-based Person Retrieval (UP-Person) to thoroughly transfer the multi-modal knowledge from CLIP. Specifically, UP-Person simultaneously integrates three lightweight PETL components including Prefix, LoRA and Adapter, where Prefix and LoRA are devised together to mine local information with task-specific information prompts, and Adapter is designed to adjust global feature representations. Additionally, two vanilla submodules are optimized to adapt to the unified architecture of TPR. For one thing, S-Prefix is proposed to boost attention of prefix and enhance the gradient propagation of prefix tokens, which improves the flexibility and performance of the vanilla prefix. For another thing, L-Adapter is designed in parallel with layer normalization to adjust the overall distribution, which can resolve conflicts caused by overlap and interaction among multiple submodules. Extensive experimental results demonstrate that our UP-Person achieves state-of-the-art results across various person retrieval datasets, including CUHK-PEDES, ICFG-PEDES and RSTPReid while merely fine-tuning 4.7\% parameters. Code is available at this https URL.
基于文本的人检索(TPR)作为一种多模态任务,旨在根据给定的文本描述从候选图像池中检索目标人物,在对比视觉语言预训练模型的进步推动下,最近引起了广泛的关注。先前的工作利用预先训练好的CLIP提取人物的视觉和文本特征,并对整个网络进行全面微调,显示了与单模态预训练模型相比显著的性能改进。然而,全面调整大型模型容易导致过拟合,并阻碍泛化能力。在本文中,我们为基于文本的人检索(TPR)提出了一个新颖的统一参数高效的迁移学习(PETL)方法——UP-Person,以彻底转移CLIP中的多模态知识。具体而言,UP-Person同时整合了三个轻量级的PETL组件:Prefix、LoRA和Adapter,其中Prefix和LoRA共同设计用于挖掘具有任务特定信息提示的局部信息,而Adapter则旨在调整全局特征表示。此外,还优化了两个基础子模块以适应TPR的统一架构。一方面,提出了S-Prefix来增强前缀注意力并改进前缀标记的梯度传播,从而提高基础Prefix的灵活性和性能;另一方面,L-Adapter与层归一化平行设计用于调整整体分布,可以解决由于多个子模块之间重叠和相互作用引起的问题冲突。广泛的实验结果表明,我们的UP-Person在包括CUHK-PEDES、ICFG-PEDES和RSTPReid在内的各种人物检索数据集上实现了最先进的结果,并且仅微调了4.7%的参数。代码可在以下链接获取:[提供具体的URL]
https://arxiv.org/abs/2504.10084
In this paper, we explore how conventional image enhancement can improve model robustness in medical image analysis. By applying commonly used normalization methods to images from various vendors and studying their influence on model generalization in transfer learning, we show that the nonlinear characteristics of domain-specific image dynamics cannot be addressed by simple linear transforms. To tackle this issue, we reformulate the image harmonization task as an exposure correction problem and propose a method termed Global Deep Curve Estimation (GDCE) to reduce domain-specific exposure mismatch. GDCE performs enhancement via a pre-defined polynomial function and is trained with the help of a ``domain discriminator'', aiming to improve model transparency in downstream tasks compared to existing black-box methods.
在这篇论文中,我们探讨了传统图像增强技术如何能提高医学影像分析中的模型鲁棒性。通过将常用的归一化方法应用于不同供应商提供的图像,并研究这些方法对迁移学习中模型泛化能力的影响,我们发现简单的线性变换无法解决特定领域内图像动态变化的非线性特征问题。为了解决这一问题,我们将图像校准任务重新定义为曝光调整问题,并提出了一种名为全局深度曲线估计(Global Deep Curve Estimation, GDCE)的方法来减少特定领域的曝光差异。GDCE通过预定义的多项式函数进行增强,并在“领域鉴别器”的帮助下进行训练,旨在与现有的黑盒方法相比,在下游任务中提升模型的透明度。
https://arxiv.org/abs/2504.10080
Diabetic retinopathy is a leading cause of vision impairment, making its early diagnosis through fundus imaging critical for effective treatment planning. However, the presence of poor quality fundus images caused by factors such as inadequate illumination, noise, blurring and other motion artifacts yields a significant challenge for accurate DR screening. In this study, we propose progressive transfer learning for multi pass restoration to iteratively enhance the quality of degraded fundus images, ensuring more reliable DR screening. Unlike previous methods that often focus on a single pass restoration, multi pass restoration via PTL can achieve a superior blind restoration performance that can even improve most of the good quality fundus images in the dataset. Initially, a Cycle GAN model is trained to restore low quality images, followed by PTL induced restoration passes over the latest restored outputs to improve overall quality in each pass. The proposed method can learn blind restoration without requiring any paired data while surpassing its limitations by leveraging progressive learning and fine tuning strategies to minimize distortions and preserve critical retinal features. To evaluate PTL's effectiveness on multi pass restoration, we conducted experiments on DeepDRiD, a large scale fundus imaging dataset specifically curated for diabetic retinopathy detection. Our result demonstrates state of the art performance, showcasing PTL's potential as a superior approach to iterative image quality restoration.
糖尿病性视网膜病变是导致视力损害的主要原因之一,因此通过眼底成像进行早期诊断对于有效的治疗计划至关重要。然而,由于照明不足、噪声、模糊以及其他运动伪影等原因造成的眼底图像质量差,给准确的DR筛查带来了重大挑战。在这项研究中,我们提出了一种基于渐进式迁移学习的多次迭代恢复方法,以逐步提升受损眼底图像的质量,从而确保更加可靠的DR筛查结果。 与以往专注于单一迭代恢复的方法不同,通过PTL实现的多次迭代恢复可以达到更优的无配对数据盲恢复性能,甚至能够改善数据集中大多数高质量的眼底图像。首先训练一个Cycle GAN模型来修复低质量的图像,随后在每次迭代中利用PTL诱导的恢复步骤改进最新一次恢复结果的整体质量。所提出的方法可以在不依赖任何成对数据的情况下学习无配对数据盲恢复,并通过渐进式学习和精细调优策略减少失真并保留关键视网膜特征。 为了评估PTL在多次迭代恢复中的有效性,我们在DeepDRiD上进行了实验,这是一个专门为糖尿病性视网膜病变检测而专门编纂的大规模眼底成像数据集。我们的结果展示了最先进的性能水平,并显示了PTL作为迭代图像质量恢复的优越方法的巨大潜力。
https://arxiv.org/abs/2504.10025
Whereas in general computer vision, transformer-based architectures have quickly become the gold standard, microelectronics defect detection still heavily relies on convolutional neural networks (CNNs). We hypothesize that this is due to the fact that a) transformers have an increased need for data and b) labelled image generation procedures for microelectronics are costly, and labelled data is therefore sparse. Whereas in other domains, pre-training on large natural image datasets can mitigate this problem, in microelectronics transfer learning is hindered due to the dissimilarity of domain data and natural images. Therefore, we evaluate self pre-training, where models are pre-trained on the target dataset, rather than another dataset. We propose a vision transformer (ViT) pre-training framework for defect detection in microelectronics based on masked autoencoders (MAE). In MAE, a large share of image patches is masked and reconstructed by the model during pre-training. We perform pre-training and defect detection using a dataset of less than 10.000 scanning acoustic microscopy (SAM) images labelled using transient thermal analysis (TTA). Our experimental results show that our approach leads to substantial performance gains compared to a) supervised ViT, b) ViT pre-trained on natural image datasets, and c) state-of-the-art CNN-based defect detection models used in the literature. Additionally, interpretability analysis reveals that our self pre-trained models, in comparison to ViT baselines, correctly focus on defect-relevant features such as cracks in the solder material. This demonstrates that our approach yields fault-specific feature representations, making our self pre-trained models viable for real-world defect detection in microelectronics.
尽管在通用计算机视觉领域,基于变压器的架构迅速成为行业标准,但在微电子缺陷检测中,卷积神经网络(CNNs)仍然被广泛使用。我们假设这种现象的原因在于:a) 变压器模型需要更多的数据;b) 微电子领域的标记图像生成过程成本高昂,因此标注的数据相对稀缺。在其他领域中,通过大规模自然图像数据集进行预训练可以缓解这一问题,但在微电子学领域,由于域内数据与自然图像之间的差异性,迁移学习的效率受到了限制。 为此,我们评估了自我预训练方法,在这种方法下模型是在目标数据集上预先训练的,而不是在其他数据集上。我们提出了一种基于掩码自动编码器(MAE)的视觉变压器(ViT)框架,用于微电子缺陷检测。在MAE中,大量图像块被掩盖并在预训练过程中由模型重建。 我们在一个包含不到10,000张扫描声学显微镜(SAM)图像的数据集上进行了实验,这些图像使用瞬态热分析(TTA)进行标记。我们的实验证明了与a) 监督学习的ViT、b) 在自然图像数据集中预训练的ViT以及c) 文献中使用的最先进的基于CNN的缺陷检测模型相比,我们提出的方法在性能上有了显著提高。 此外,解释性分析表明,相比于基线的ViT模型,我们的自我预训练模型能够正确地聚焦于与缺陷相关的特征(如焊料材料中的裂纹)。这证明了我们的方法可以生成具有故障特定特性的表示,使得我们在微电子领域的自我预训练模型适合用于实际应用中的缺陷检测。
https://arxiv.org/abs/2504.10021
Neuro-developmental disorders are manifested as dysfunctions in cognition, communication, behaviour and adaptability, and deep learning-based computer-aided diagnosis (CAD) can alleviate the increasingly strained healthcare resources on neuroimaging. However, neuroimaging such as fMRI contains complex spatio-temporal features, which makes the corresponding representations susceptible to a variety of distractions, thus leading to less effective in CAD. For the first time, we present a Comorbidity-Informed Transfer Learning(CITL) framework for diagnosing neuro-developmental disorders using fMRI. In CITL, a new reinforced representation generation network is proposed, which first combines transfer learning with pseudo-labelling to remove interfering patterns from the temporal domain of fMRI and generates new representations using encoder-decoder architecture. The new representations are then trained in an architecturally simple classification network to obtain CAD model. In particular, the framework fully considers the comorbidity mechanisms of neuro-developmental disorders and effectively integrates them with semi-supervised learning and transfer learning, providing new perspectives on interdisciplinary. Experimental results demonstrate that CITL achieves competitive accuracies of 76.32% and 73.15% for detecting autism spectrum disorder and attention deficit hyperactivity disorder, respectively, which outperforms existing related transfer learning work for 7.2% and 0.5% respectively.
神经发育障碍表现为认知、沟通、行为和适应能力方面的功能紊乱,基于深度学习的计算机辅助诊断(CAD)可以缓解在神经影像方面日益紧张的医疗资源压力。然而,像功能性磁共振成像(fMRI)这样的神经影像包含复杂的时空特征,这使得相应的表示容易受到各种干扰的影响,从而导致CAD效果不佳。 首次提出了一种基于共病信息的迁移学习框架(CITL),用于使用fMRI诊断神经发育障碍。在CITL中,提出了一个新的强化表示生成网络,该网络首先结合了迁移学习和伪标记技术,以去除来自fMRI时间域中的干扰模式,并利用编码器-解码器架构生成新的表示。接着,这些新表示将在一个结构简单的分类网络中进行训练,以获得CAD模型。特别地,该框架全面考虑神经发育障碍的共病机制,并有效整合了半监督学习和迁移学习,为跨学科研究提供了新的视角。 实验结果显示,CITL在检测自闭症谱系障碍和注意力缺陷多动障碍方面分别达到了76.32%和73.15%的竞争性准确率,这比现有的相关迁移学习工作高出7.2%和0.5%,表明该方法具有显著优势。
https://arxiv.org/abs/2504.09463
Reliable artificial intelligence (AI) models for medical image analysis often depend on large and diverse labeled datasets. Federated learning (FL) offers a decentralized and privacy-preserving approach to training but struggles in highly non-independent and identically distributed (non-IID) settings, where institutions with more representative data may experience degraded performance. Moreover, existing large-scale FL studies have been limited to adult datasets, neglecting the unique challenges posed by pediatric data, which introduces additional non-IID variability. To address these limitations, we analyzed n=398,523 adult chest radiographs from diverse institutions across multiple countries and n=9,125 pediatric images, leveraging transfer learning from general-purpose self-supervised image representations to classify pneumonia and cases with no abnormality. Using state-of-the-art vision transformers, we found that FL improved performance only for smaller adult datasets (P<0.001) but degraded performance for larger datasets (P<0.064) and pediatric cases (P=0.242). However, equipping FL with self-supervised weights significantly enhanced outcomes across pediatric cases (P=0.031) and most adult datasets (P<0.008), except the largest dataset (P=0.052). These findings underscore the potential of easily deployable general-purpose self-supervised image representations to address non-IID challenges in clinical FL applications and highlight their promise for enhancing patient outcomes and advancing pediatric healthcare, where data scarcity and variability remain persistent obstacles.
可靠的人工智能(AI)模型在医学影像分析中往往依赖于大规模且多样化的标注数据集。联邦学习(FL)提供了一种去中心化和保护隐私的方法来训练模型,但在高度非独立同分布(non-IID)的环境中却面临挑战,在这种环境下,拥有更具代表性的数据机构可能会遇到性能下降的问题。此外,现有的大规模FL研究主要局限于成人数据集,忽视了儿科数据所带来的独特挑战,这引入了额外的非IID变异性。为了解决这些限制,我们分析了来自多个国家不同机构的大规模成人胸部X光片(n=398,523)和小规模儿童影像(n=9,125),利用通用自监督图像表示进行迁移学习,以分类肺炎病例及无异常的案例。使用最先进的视觉变换器技术,我们发现FL仅在较小的成人数据集中提高了性能(P<0.001),但在较大的数据集和儿科病例中则降低了性能(分别为P<0.064和P=0.242)。然而,当将自监督权重引入到FL中时,它显著改善了儿科案例的结果(P=0.031)以及大部分成人数据集的表现(P<0.008),除了最大规模的数据集外(P=0.052)。这些发现强调了通用的自监督图像表示在临床FL应用中的非IID挑战解决上的潜力,并突显了其对于改善患者结果和推动儿科医疗进步的重要前景,尤其是在数据稀缺性和变异性持续存在的背景下。
https://arxiv.org/abs/2504.08584
The level of ripeness is essential in determining the quality of bananas. To correctly estimate banana maturity, the metrics of international marketing standards need to be considered. However, the process of assessing the maturity of bananas at an industrial level is still carried out using manual methods. The use of CNN models is an attractive tool to solve the problem, but there is a limitation regarding the availability of sufficient data to train these models reliably. On the other hand, in the state-of-the-art, existing CNN models and the available data have reported that the accuracy results are acceptable in identifying banana maturity. For this reason, this work presents the generation of a robust dataset that combines real and synthetic data for different levels of banana ripeness. In addition, it proposes a simple CNN architecture, which is trained with synthetic data and using the transfer learning technique, the model is improved to classify real data, managing to determine the level of maturity of the banana. The proposed CNN model is evaluated with several architectures, then hyper-parameter configurations are varied, and optimizers are used. The results show that the proposed CNN model reaches a high accuracy of 0.917 and a fast execution time.
成熟度是决定香蕉质量的关键因素。正确估计香蕉的成熟程度需要考虑国际市场的标准指标。然而,目前在工业水平上评估香蕉成熟度的过程仍然依赖于手动方法。使用卷积神经网络(CNN)模型来解决这个问题是一个有吸引力的方法,但存在缺乏足够数据以可靠地训练这些模型的问题。另一方面,在现有技术中,现有的CNN模型和可用的数据已经表明,识别香蕉成熟度的准确性结果是可以接受的。 鉴于此,本研究提出了一种生成健壮数据集的方法,该数据集结合了不同成熟阶段的真实数据与合成数据。此外,还提出了一个简单的CNN架构,该架构通过使用合成数据进行训练,并采用迁移学习技术来改进模型以分类真实数据,从而能够确定香蕉的成熟度。 所提出的CNN模型经过多种架构评估,随后调整超参数配置并应用优化器。结果表明,所提出的CNN模型达到了0.917的高准确率和快速执行时间。
https://arxiv.org/abs/2504.08568
Machine learning and Bayesian optimization (BO) algorithms can significantly accelerate the optimization of chemical reactions. Transfer learning can bolster the effectiveness of BO algorithms in low-data regimes by leveraging pre-existing chemical information or data outside the direct optimization task (i.e., source data). Large language models (LLMs) have demonstrated that chemical information present in foundation training data can give them utility for processing chemical data. Furthermore, they can be augmented with and help synthesize potentially multiple modalities of source chemical data germane to the optimization task. In this work, we examine how chemical information from LLMs can be elicited and used for transfer learning to accelerate the BO of reaction conditions to maximize yield. Specifically, we show that a survey-like prompting scheme and preference learning can be used to infer a utility function which models prior chemical information embedded in LLMs over a chemical parameter space; we find that the utility function shows modest correlation to true experimental measurements (yield) over the parameter space despite operating in a zero-shot setting. Furthermore, we show that the utility function can be leveraged to focus BO efforts in promising regions of the parameter space, improving the yield of the initial BO query and enhancing optimization in 4 of the 6 datasets studied. Overall, we view this work as a step towards bridging the gap between the chemistry knowledge embedded in LLMs and the capabilities of principled BO methods to accelerate reaction optimization.
机器学习和贝叶斯优化(BO)算法可以显著加速化学反应的优化过程。迁移学习可以通过利用预先存在的化学信息或与直接优化任务无关的数据(即源数据),增强低数据量环境下贝叶斯优化算法的有效性。大型语言模型(LLMs)展示了其在处理化学数据方面的实用性,这得益于它们从基础训练数据中获得的化学信息。此外,这些模型可以被扩充,并有助于合成多种模态的源化学数据以适用于特定的优化任务。 在这项工作中,我们探讨了如何从大型语言模型中提取化学信息并用于迁移学习,从而加速反应条件的贝叶斯优化过程,以期最大化产率。具体而言,我们展示了可以通过类似调查的问题提示和偏好学习来推断出一个效用函数,该函数能够模拂数字语言模型嵌入在化学参数空间中的先验化学信息;尽管处于零样本设置下(即没有为特定任务进行过训练),发现这个效用函数与实验测量结果(产率)的真正值在整个参数空间内有适度的相关性。此外,我们还展示了可以利用这个效用函数来聚焦于贝叶斯优化过程中最有前景的参数区域上,从而提高初始查询阶段的产率,并在研究的六个数据集中的四个中增强了优化效果。 总体而言,我们将这项工作视为一种尝试,旨在弥合大型语言模型中的化学知识与基于原理的贝叶斯优化方法能力之间的差距。
https://arxiv.org/abs/2504.08874
Focal cortical dysplasia (FCD) type II is a major cause of drug-resistant epilepsy, often curable only by surgery. Despite its clinical importance, the diagnosis of FCD is very difficult in MRI because of subtle abnormalities, leading to misdiagnosis. This study investigates the use of 3D convolutional neural networks (3D-CNNs) for FCD detection, using a dataset of 170 subjects (85 FCD patients and 85 controls) composed of T1-weighted and FLAIR MRI scans. In particular, it investigates the benefits obtained from cross-modality transfer learning and explainable artificial intelligence (XAI) techniques, in particular Gradient-weighted Class Activation Mapping (Grad-CAM). ResNet architectures (ResNet-18, -34, and -50) were implemented, employing transfer learning strategies that used pre-trained weights from segmentation tasks. Results indicate that transfer learning significantly enhances classification accuracy (up to 80.3%) and interpretability, as measured by a novel Heat-Score metric, which evaluates the model's focus on clinically relevant regions. Improvements in the Heat-Score metric underscore the model's seizure zone localization capabilities, bringing AI predictions and clinical insights closer together. These results highlight the importance of transfer learning, including cross-modality, and XAI in advancing AI-based medical diagnostics, especially for difficult-to-diagnose pathologies such as FCD.
局灶性皮质发育不良(FCD)II型是药物难治性癫痫的主要原因,通常只能通过手术治愈。尽管其临床意义重大,但由于MRI中细微的异常表现,FCD的确诊非常困难,这导致误诊率较高。本研究探讨了使用3D卷积神经网络(3D-CNNs)进行FCD检测的可能性,并利用包含170名受试者的数据集进行了实验(85名为FCD患者,85名为对照组),数据集中包括T1加权和FLAIR MRI扫描图像。特别是,研究还探讨了跨模式迁移学习和可解释人工智能(XAI)技术(如梯度加权类激活映射Grad-CAM)所带来的好处。 实验中实施了ResNet架构(包括ResNet-18、-34和-50),并采用了使用分割任务预训练权重的迁移学习策略。结果显示,跨模式迁移学习显著提高了分类准确性(最高达到80.3%),并且通过一种新的Heat-Score指标测量模型的可解释性,该指标评估了模型对临床相关区域的关注度。Heat-Score指标改进表明了模型在癫痫灶定位能力上的提升,使AI预测与临床见解更加接近。 这些结果强调了包括跨模式迁移学习和XAI在内的技术对于基于人工智能的医学诊断的重要性,特别是在像FCD这样的难以诊断病理情况中的应用价值。
https://arxiv.org/abs/2504.07775
We benchmark foundation models image embeddings for classification and retrieval in e-Commerce, evaluating their suitability for real-world applications. Our study spans embeddings from pre-trained convolutional and transformer models trained via supervised, self-supervised, and text-image contrastive learning. We assess full fine-tuning and transfer learning (top-tuning) on six diverse e-Commerce datasets: fashion, consumer goods, cars, food, and retail. Results show full fine-tuning consistently performs well, while text-image and self-supervised embeddings can match its performance with less training. While supervised embeddings remain stable across architectures, SSL and contrastive embeddings vary significantly, often benefiting from top-tuning. Top-tuning emerges as an efficient alternative to full fine-tuning, reducing computational costs. We also explore cross-tuning, noting its impact depends on dataset characteristics. Our findings offer practical guidelines for embedding selection and fine-tuning strategies, balancing efficiency and performance.
我们在电子商务环境中对基础模型的图像嵌入进行分类和检索基准测试,以评估其在实际应用中的适用性。我们的研究涵盖了通过监督学习、自监督学习以及文本-图像对比学习训练的预训练卷积神经网络和变换器模型生成的各种嵌入向量。我们通过对六个不同的电子商务数据集(包括时尚、消费品、汽车、食品和零售)进行完全微调和迁移学习(仅对顶层进行微调)来评估这些方法的效果。 实验结果显示,完全微调在所有情况下都能保持良好的性能,但使用文本-图像对比学习或自监督学习生成的嵌入向量能够通过较少的训练达到类似的表现。相比之下,基于监督学习的嵌入向量跨不同架构表现稳定,而自监督学习和对比学习得到的嵌入则表现出较大的差异性,并且通常可以从仅对顶层进行微调中受益。 我们发现,仅对顶层进行微调是一种高效替代完全微调的方法,可以降低计算成本。此外,还探讨了交叉微调的效果,注意到其影响取决于数据集的特点。我们的研究结果为选择合适的嵌入向量和制定有效的微调策略提供了实用指南,在效率与性能之间寻求平衡点。
https://arxiv.org/abs/2504.07567
The histopathological images contain a huge amount of information, which can make diagnosis an extremely timeconsuming and tedious task. In this study, we developed a completely automated system to detect regions of interest (ROIs) in whole slide images (WSI) of renal cell carcinoma (RCC), to reduce time analysis and assist pathologists in making more accurate decisions. The proposed approach is based on an efficient texture descriptor named dominant rotated local binary pattern (DRLBP) and color transformation to reveal and exploit the immense texture variability at the microscopic high magnifications level. Thereby, the DRLBPs retain the structural information and utilize the magnitude values in a local neighborhood for more discriminative power. For the classification of the relevant ROIs, feature extraction of WSIs patches was performed on the color channels separately to form the histograms. Next, we used the most frequently occurring patterns as a feature selection step to discard non-informative features. The performances of different classifiers on a set of 1800 kidney cancer patches originating from 12 whole slide images were compared and evaluated. Furthermore, the small size of the image dataset allows to investigate deep learning approach based on transfer learning for image patches classification by using deep features and fine-tuning methods. High recognition accuracy was obtained and the classifiers are efficient, the best precision result was 99.17% achieved with SVM. Moreover, transfer learning models perform well with comparable performance, and the highest precision using ResNet-50 reached 98.50%. The proposed approach results revealed a very efficient image classification and demonstrated efficacy in identifying ROIs. This study presents an automatic system to detect regions of interest relevant to the diagnosis of kidney cancer in whole slide histopathology images.
组织病理图像包含大量的信息,这使得诊断成为一个极其耗时和繁琐的任务。在这项研究中,我们开发了一个完全自动化的系统来检测肾细胞癌(RCC)整个载玻片图像中的感兴趣区域(ROI),以减少分析时间并帮助病理科医生做出更准确的决策。该方法基于一种高效的纹理描述符——主旋转局部二值模式(DRLBP)和颜色变换,用于揭示并在显微镜下高放大倍数水平上利用巨大的纹理变化。因此,DRLBPs 保留了结构信息,并在局部邻域中使用幅度值以获得更强的鉴别能力。为了对相关的 ROI 进行分类,我们在各自的颜色通道上分别执行整个载玻片图像(WSI)补丁的特征提取,形成直方图。接下来,我们使用最常见的模式作为特征选择步骤来排除非信息性特征。在一组由 12 张完整载玻片图像产生的 1800 个肾癌补丁上比较并评估了不同分类器的表现和效果。此外,由于数据集较小,我们可以研究基于迁移学习的深度学习方法,利用深层特征和微调技术对图像补丁进行分类。获得了很高的识别精度,并且这些分类器是高效的,在使用 SVM 时取得了最高的精确度 99.17%。此外,迁移学习模型表现良好并且性能相当,ResNet-50 的最高精确度为 98.50%。所提出的方法显示了非常有效的图像分类效果,并证明在识别 ROI 方面具有高效性。这项研究提出了一种自动系统,用于检测肾癌整个载玻片组织病理图像中与诊断相关的感兴趣区域。
https://arxiv.org/abs/2504.07313
In this paper, we propose EDIT (Encoder-Decoder Image Transformer), a novel architecture designed to mitigate the attention sink phenomenon observed in Vision Transformer models. Attention sink occurs when an excessive amount of attention is allocated to the [CLS] token, distorting the model's ability to effectively process image patches. To address this, we introduce a layer-aligned encoder-decoder architecture, where the encoder utilizes self-attention to process image patches, while the decoder uses cross-attention to focus on the [CLS] token. Unlike traditional encoder-decoder framework, where the decoder depends solely on high-level encoder representations, EDIT allows the decoder to extract information starting from low-level features, progressively refining the representation layer by layer. EDIT is naturally interpretable demonstrated through sequential attention maps, illustrating the refined, layer-by-layer focus on key image features. Experiments on ImageNet-1k and ImageNet-21k, along with transfer learning tasks, show that EDIT achieves consistent performance improvements over DeiT3 models. These results highlight the effectiveness of EDIT's design in addressing attention sink and improving visual feature extraction.
在这篇论文中,我们提出了EDIT(编码器-解码器图像变换器),这是一种旨在缓解视觉Transformer模型中观察到的“注意力汇聚”现象的新架构。“注意力汇聚”发生在过多的关注被分配给[CLS]标记时,这会扭曲模型有效处理图像补丁的能力。为了解决这个问题,我们引入了一种层对齐的编码器-解码器架构,在这种架构中,编码器利用自注意力机制来处理图像补丁,而解码器则使用交叉注意力来聚焦于[CLS]标记。与传统的编码器-解码器框架不同,其中解码器仅依赖于高层次的编码表示,EDIT允许解码器从低级特征开始提取信息,并逐层逐步细化表示。通过顺序注意图,EDIT自然具有可解释性,展示了对关键图像特征逐层精炼的关注。 在ImageNet-1k和ImageNet-21k数据集上的实验以及迁移学习任务中显示,EDIT相对于DeiT3模型实现了持续的性能改进。这些结果突显了EDIT设计在解决注意力汇聚问题并提高视觉特征提取方面的有效性。
https://arxiv.org/abs/2504.06738
This paper presents a novel approach to constructing an English-to-Telugu translation model by leveraging transfer learning techniques and addressing the challenges associated with low-resource languages. Utilizing the Bharat Parallel Corpus Collection (BPCC) as the primary dataset, the model incorporates iterative backtranslation to generate synthetic parallel data, effectively augmenting the training dataset and enhancing the model's translation capabilities. The research focuses on a comprehensive strategy for improving model performance through data augmentation, optimization of training parameters, and the effective use of pre-trained models. These methodologies aim to create a robust translation system that can handle diverse sentence structures and linguistic nuances in both English and Telugu. This work highlights the significance of innovative data handling techniques and the potential of transfer learning in overcoming limitations posed by sparse datasets in low-resource languages. The study contributes to the field of machine translation and seeks to improve communication between English and Telugu speakers in practical contexts.
本文提出了一种新颖的方法,通过利用迁移学习技术来构建英语到泰卢固语的翻译模型,并解决了与低资源语言相关的挑战。该研究主要使用印度平行语料库集合(BPCC)作为数据集,模型采用迭代反向翻译生成合成并行数据,从而有效扩充训练数据集并提高模型的翻译能力。研究重点关注通过数据增强、优化训练参数和充分利用预训练模型来提升模型性能的全面策略。这些方法旨在创建一个能够处理英语和泰卢固语中各种句式结构和语言细微差别的稳健翻译系统。 这项工作强调了创新的数据处理技术的重要性,以及迁移学习在克服低资源语言稀疏数据集限制方面的潜力。该研究为机器翻译领域做出了贡献,并致力于通过实用场景改善说英语和泰卢固语人群之间的沟通。
https://arxiv.org/abs/2504.05914
Autism spectrum disorder (ASD) is a highly disabling mental disease that brings significant impairments of social interaction ability to the patients, making early screening and intervention of ASD critical. With the development of the machine learning and neuroimaging technology, extensive research has been conducted on machine classification of ASD based on structural Magnetic Resonance Imaging (s-MRI). However, most studies involve with datasets where participants' age are above 5 and lack interpretability. In this paper, we propose a machine learning method for ASD classification in children with age range from 0.92 to 4.83 years, based on s-MRI features extracted using contrastive variational autoencoder (CVAE). 78 s-MRIs, collected from Shenzhen Children's Hospital, are used for training CVAE, which consists of both ASD-specific feature channel and common shared feature channel. The ASD participants represented by ASD-specific features can be easily discriminated from TC participants represented by the common shared features. In case of degraded predictive accuracy when data size is extremely small, a transfer learning strategy is proposed here as a potential solution. Finally, we conduct neuroanatomical interpretation based on the correlation between s-MRI features extracted from CVAE and surface area of different cortical regions, which discloses potential biomarkers that could help target treatments of ASD in the future.
自闭症谱系障碍(ASD)是一种高度致残的精神疾病,会给患者带来严重的社交互动能力损害,因此早期筛查和干预至关重要。随着机器学习与神经影像技术的发展,基于结构磁共振成像(s-MRI)的ASD自动分类研究得到了广泛开展。然而,大多数现有研究使用的数据集中参与者年龄都在5岁以上,并且这些研究缺乏可解释性。本文提出了一种基于对比变分自编码器(CVAE)提取的s-MRI特征,在儿童年龄范围为0.92至4.83岁的情况下进行ASD分类的方法。 我们使用来自深圳市儿童医院收集到的78份s-MRI数据集,用于训练CVAE模型。该模型包括一个针对自闭症的独特特征通道和一个常见的共享特征通道。通过独特特征表示的ASD参与者可以很容易地从通过常见共享特征表示的对照组(TC)中区分出来。 当数据量极小时,分类准确率可能下降。为此,我们提出了迁移学习策略作为潜在解决方案以提高模型性能。 最后,我们在CVAE提取的s-MRI特征与不同皮质区表面面积的相关性基础上进行神经解剖学解释,揭示了未来用于ASD治疗目标选择的潜在生物标志物。
https://arxiv.org/abs/2307.00976
As the prevalence of mental health crises increases on social media platforms, identifying and preventing potential harm has become an urgent challenge. This study introduces a large language model (LLM)-based text transfer recognition method for social network crisis intervention, enhanced with domain-specific mental health knowledge. We propose a multi-level framework that incorporates transfer learning using BERT, and integrates mental health knowledge, sentiment analysis, and behavior prediction techniques. The framework includes a crisis annotation tool trained on social media datasets from real-world events, enabling the model to detect nuanced emotional cues and identify psychological crises. Experimental results show that the proposed method outperforms traditional models in crisis detection accuracy and exhibits greater sensitivity to subtle emotional and contextual variations.
随着社交媒体平台上心理健康危机的增加,识别和预防潜在危害已成为一个紧迫的挑战。本研究介绍了一种基于大型语言模型(LLM)的文本转换识别方法,用于社会网络危机干预,并结合了特定领域的心理健康知识。我们提出一个多级框架,该框架采用了BERT的迁移学习技术,并集成了心理健康知识、情感分析和技术预测行为的方法。该框架包括一个使用来自真实事件的社会媒体数据集训练的危机标注工具,使模型能够检测细微的情感线索并识别心理危机。实验结果显示,所提出的方法在危机检测准确性方面优于传统模型,并且对微妙的情绪和上下文变化表现出更高的敏感度。
https://arxiv.org/abs/2504.07983
Information on standing dead trees is important for understanding forest ecosystem functioning and resilience but has been lacking over large geographic regions. Climate change has caused large-scale tree mortality events that can remain undetected due to limited data. In this study, we propose a novel method for segmenting standing dead trees using aerial multispectral orthoimages. Because access to annotated datasets has been a significant problem in forest remote sensing due to the need for forest expertise, we introduce a method for domain transfer by leveraging domain adaptation to learn a transformation from a source domain X to target domain Y. In this Image-to-Image translation task, we aim to utilize available annotations in the target domain by pre-training a segmentation network. When images from a new study site without annotations are introduced (source domain X), these images are transformed into the target domain. Then, transfer learning is applied by inferring the pre-trained network on domain-adapted images. In addition to investigating the feasibility of current domain adaptation approaches for this objective, we propose a novel approach called the Attention-guided Domain Adaptation Network (ADA-Net) with enhanced contrastive learning. Accordingly, the ADA-Net approach provides new state-of-the-art domain adaptation performance levels outperforming existing approaches. We have evaluated the proposed approach using two datasets from Finland and the US. The USA images are converted to the Finland domain, and we show that the synthetic USA2Finland dataset exhibits similar characteristics to the Finland domain images. The software implementation is shared at this https URL. The data is publicly available at this https URL.
关于站立枯死树木的信息对于理解森林生态系统的功能和韧性至关重要,但在广大地理区域中此类信息一直不足。气候变化导致的大规模树木死亡事件可能会因数据有限而未被发现。在本研究中,我们提出了一种使用航空多光谱正射影像分割站立枯死树木的新方法。由于获取注释数据集一直是森林遥感中的一个重大问题(需要森林专业知识),我们引入了领域转换的方法,通过利用域适应来学习从源域X到目标域Y的变换。 在这一图像到图像翻译任务中,我们的目标是通过预训练分割网络利用目标域中存在的标注。当引入来自新研究地点且没有注释的新影像(作为源域X)时,这些图片会被转换为目标域。然后,在领域适应图像上应用迁移学习来推断预训练的网络。 除了调查当前领域适应方法在此目标中的可行性外,我们还提出了一种称为注意力引导领域适应网络(ADA-Net)的新方法,并通过增强对比性学习提升了这一技术。相应地,ADA-Net的方法提供了新的、最先进的域适应性能水平,超越了现有的方法。我们在来自芬兰和美国的两个数据集上评估了所提出的这种方法。美国图像被转换为芬兰领域,我们展示了合成的USA2Finland数据集与芬兰领域图像具有相似特征。 软件实现可在[此处](https://example.com/software)获取。数据可以在[此处](https://example.com/data)公开访问。
https://arxiv.org/abs/2504.04271