This study presents the first comprehensive evaluation of spatial generalization techniques, which are essential for the practical deployment of deep learning-based radio-frequency (RF) sensing. Focusing on people counting in indoor environments using frequency-modulated continuous-wave (FMCW) multiple-input multiple-output (MIMO) radar, we systematically investigate a broad set of approaches, including amplitude-based statistical preprocessing (sigmoid weighting and threshold zeroing), frequency-domain filtering, autoencoder-based background suppression, data augmentation strategies, and transfer learning. Experimental results collected across two environments with different layouts demonstrate that sigmoid-based amplitude weighting consistently achieves superior cross-environment performance, yielding 50.1% and 55.2% reductions in root-mean-square error (RMSE) and mean absolute error (MAE), respectively, compared with baseline methods. Data augmentation provides additional though modest benefits, with improvements up to 8.8% in MAE. By contrast, transfer learning proves indispensable for large spatial shifts, achieving 82.1% and 91.3% reductions in RMSE and MAE, respectively, with 540 target-domain samples. Taken together, these findings establish a highly practical direction for developing radar sensing systems capable of maintaining robust accuracy under spatial variations by integrating deep learning models with amplitude-based preprocessing and efficient transfer learning.
这项研究首次全面评估了空间泛化技术,这些技术对于基于深度学习的射频(RF)传感的实际部署至关重要。本研究专注于使用频率调制连续波(FMCW)多输入多输出(MIMO)雷达在室内环境中进行人员计数,并系统地调查了一整套方法,包括基于幅度的统计预处理(sigmoid加权和阈值置零)、频域滤波、基于自动编码器的背景抑制、数据增强策略以及迁移学习。 实验结果来自两个具有不同布局的环境,在这些环境中收集的数据表明,基于sigmoid的幅度加权在跨环境性能方面始终优于基线方法,分别将均方根误差(RMSE)和平均绝对误差(MAE)降低了50.1% 和 55.2%。数据增强虽然提供了额外但相对较小的好处,在平均绝对误差上最多提高了8.8%。相比之下,迁移学习对于大的空间变化是必不可少的,在仅有540个目标领域样本的情况下,分别实现了RMSE和MAE减少了82.1%和91.3%。 综合来看,这些发现确立了一条高度实用的方向,通过将深度学习模型与基于幅度的预处理以及高效的迁移学习相结合,开发出能够在空间变化下保持稳健准确性的雷达传感系统。
https://arxiv.org/abs/2512.13018
Transformer architecture search (TAS) aims to automatically discover efficient vision transformers (ViTs), reducing the need for manual design. Existing TAS methods typically train an over-parameterized network (i.e., a supernet) that encompasses all candidate architectures (i.e., subnets). However, all subnets share the same set of weights, which leads to interference that degrades the smaller subnets severely. We have found that well-trained small subnets can serve as a good foundation for training larger ones. Motivated by this, we propose a progressive training framework, dubbed GrowTAS, that begins with training small subnets and incorporate larger ones gradually. This enables reducing the interference and stabilizing a training process. We also introduce GrowTAS+ that fine-tunes a subset of weights only to further enhance the performance of large subnets. Extensive experiments on ImageNet and several transfer learning benchmarks, including CIFAR-10/100, Flowers, CARS, and INAT-19, demonstrate the effectiveness of our approach over current TAS methods
翻译: Transformer架构搜索(TAS)旨在自动发现高效的视觉变换器(ViTs),减少手动设计的需求。现有的TAS方法通常训练一个过度参数化的网络(即超网),该网络包含所有候选架构(即子网)。然而,所有的子网都共享同一组权重,这导致了干扰,严重影响较小的子网的表现。我们发现,经过良好训练的小型子网可以作为训练更大规模子网的良好基础。受此启发,我们提出了一种渐进式训练框架,称为GrowTAS,该框架从训练小型子网开始,并逐渐加入更大的子网。这有助于减少干扰并稳定训练过程。此外,我们还引入了GrowTAS+,它仅微调部分权重,以进一步提升大规模子网的表现。 在ImageNet和多个迁移学习基准测试(包括CIFAR-10/100、Flowers、Cars和INAT-19)上的广泛实验表明,我们的方法比当前的TAS方法更有效。
https://arxiv.org/abs/2512.12296
The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models offer a compelling, scalable source of synthetic data, their training objectives, which emphasize general artistic motion, and dataset structures fundamentally differ from HAR's requirements for kinematically precise, class-discriminative actions. This disparity creates a significant domain gap, making generalist T2M models ill-equipped for generating motions suitable for HAR classifiers. To address this challenge, we propose KineMIC (Kinetic Mining In Context), a transfer learning framework for few-shot action synthesis. KineMIC adapts a T2M diffusion model to an HAR domain by hypothesizing that semantic correspondences in the text encoding space can provide soft supervision for kinematic distillation. We operationalize this via a kinetic mining strategy that leverages CLIP text embeddings to establish correspondences between sparse HAR labels and T2M source data. This process guides fine-tuning, transforming the generalist T2M backbone into a specialized few-shot Action-to-Motion generator. We validate KineMIC using HumanML3D as the source T2M dataset and a subset of NTU RGB+D 120 as the target HAR domain, randomly selecting just 10 samples per action class. Our approach generates significantly more coherent motions, providing a robust data augmentation source that delivers a +23.1% accuracy points improvement. Animated illustrations and supplementary materials are available at (this https URL).
大规模标注运动数据集的获取成本仍然是基于骨骼的人体活动识别(HAR)的关键瓶颈。尽管文本到运动(T2M)生成模型提供了大量合成数据的有吸引力且可扩展来源,但这些模型的训练目标强调的是艺术性的通用动作,并且其数据结构与HAR对运动学精确性和类别区分性要求根本不同。这种差异导致了显著的领域差距,使得通才型T2M模型不适合为HAR分类器生成合适的运动。 为了应对这一挑战,我们提出了KineMIC(Kinetic Mining In Context),这是一个用于少量样本动作合成的迁移学习框架。KineMIC通过假设文本编码空间中的语义对应关系可以提供软监督来将T2M扩散模型适应到HAR领域,从而进行运动学蒸馏。我们通过一个利用CLIP文本嵌入建立稀疏HAR标签与T2M源数据之间关联的运动挖掘策略实现这一点。这一过程指导微调工作,将通才型T2M主干网络转化为少量样本的动作到运动生成器。 我们在HumanML3D作为源T2M数据集和NTU RGB+D 120的一部分作为目标HAR领域上验证了KineMIC的效果,并随机从每个动作类别中选择仅10个样本。我们的方法生成了更连贯的运动,提供了一种稳健的数据增强来源,使模型准确率提高了23.1个百分点。 有关动画插图和补充材料,请访问(此 https URL)。
https://arxiv.org/abs/2512.11654
Efficient crop detection via Unmanned Aerial Vehicles is critical for scaling precision agriculture, yet it remains challenging due to the small scale of targets and environmental variability. This paper addresses the detection of rice seedlings in paddy fields by leveraging a Faster R-CNN architecture initialized via transfer learning. To overcome the specific difficulties of detecting minute objects in high-resolution aerial imagery, we curate a significant UAV dataset for training and rigorously evaluate the model's generalization capabilities. Specifically, we validate performance across three distinct test sets acquired at different temporal intervals, thereby assessing robustness against varying imaging conditions. Our empirical results demonstrate that transfer learning not only facilitates the rapid convergence of object detection models in agricultural contexts but also yields consistent performance despite domain shifts in image acquisition.
通过无人驾驶飞行器(UAV)进行高效的作物检测对于扩大精准农业的应用至关重要,但由于目标规模小和环境变化大等原因,这一任务仍然具有挑战性。本文利用Faster R-CNN架构并通过迁移学习进行初始化,解决了在水田中检测水稻幼苗的问题。为了克服在高分辨率的航空图像中检测微小对象的具体困难,我们整理了一个重要的UAV数据集用于训练,并严格评估模型的泛化能力。具体来说,我们在三个不同时间间隔获取的不同测试集中验证了性能,从而评估了模型对变化成像条件下的鲁棒性。我们的实证结果表明,迁移学习不仅有助于在农业环境中快速收敛物体检测模型,而且即使图像采集领域发生变化也能保持一致的性能。
https://arxiv.org/abs/2512.11360
In recent years, the incidence of vision-threatening eye diseases has risen dramatically, necessitating scalable and accurate screening solutions. This paper presents a comprehensive study on deep learning architectures for the automated diagnosis of ocular conditions. To mitigate the "black-box" limitations of standard convolutional neural networks (CNNs), we implement a pipeline that combines deep feature extraction with interpretable image processing modules. Specifically, we focus on high-fidelity retinal vessel segmentation as an auxiliary task to guide the classification process. By grounding the model's predictions in clinically relevant morphological features, we aim to bridge the gap between algorithmic output and expert medical validation, thereby reducing false positives and improving deployment viability in clinical settings.
近年来,威胁视力的眼病发病率急剧上升,亟需可扩展且准确的筛查解决方案。本文介绍了一项关于用于自动诊断眼部状况的深度学习架构的全面研究。为了解决标准卷积神经网络(CNN)的“黑箱”局限性,我们实现了一个结合了深度特征提取与可解释图像处理模块的工作流程。具体而言,我们将高保真视网膜血管分割作为辅助任务来指导分类过程。通过将模型预测建立在临床相关的形态学特征上,我们旨在弥合算法输出和专家医学验证之间的差距,从而减少假阳性,并提高其在临床环境中的部署可行性。
https://arxiv.org/abs/2512.10608
Accurate and rapid state-of-health (SOH) monitoring plays an important role in indicating energy information for lithium-ion battery-powered portable mobile devices. To confront their variable working conditions, transfer learning (TL) emerges as a promising technique for leveraging knowledge from data-rich source working conditions, significantly reducing the training data required for SOH monitoring from target working conditions. However, traditional TL-based SOH monitoring is infeasible when applied in portable mobile devices since substantial computational resources are consumed during the TL stage and unexpectedly reduce the working endurance. To address these challenges, this paper proposes a lightweight TL-based SOH monitoring approach with constructive incremental transfer learning (CITL). First, taking advantage of the unlabeled data in the target domain, a semi-supervised TL mechanism is proposed to minimize the monitoring residual in a constructive way, through iteratively adding network nodes in the CITL. Second, the cross-domain learning ability of node parameters for CITL is comprehensively guaranteed through structural risk minimization, transfer mismatching minimization, and manifold consistency maximization. Moreover, the convergence analysis of the CITL is given, theoretically guaranteeing the efficacy of TL performance and network compactness. Finally, the proposed approach is verified through extensive experiments with a realistic unmanned air vehicles (UAV) battery dataset collected from dozens of flight missions. Specifically, the CITL outperforms SS-TCA, MMD-LSTM-DA, DDAN, BO-CNN-TL, and AS$^3$LSTM, in SOH estimation by 83.73%, 61.15%, 28.24%, 87.70%, and 57.34%, respectively, as evaluated using the index root mean square error.
准确且快速的状态健康(SOH)监测在指示锂离子电池供电便携移动设备的能源信息方面起着重要作用。为了应对这些设备工作条件的变化,迁移学习(TL)作为一种利用数据丰富的源环境知识的技术应运而生,这显著减少了目标工作条件下进行状态健康监测所需的数据训练量。然而,由于传统的基于迁移学习的状态健康监测在应用到便携移动设备时会消耗大量的计算资源,并意外地缩短了设备的工作寿命,因此这种方法在这些设备上是不可行的。 为解决上述挑战,本文提出了一种采用构造性增量迁移学习(CITL)的轻量级状态健康监测方法。首先,利用目标领域的未标记数据,提出了一个半监督迁移学习机制,通过迭代地在网络中添加节点来以建设性的方式最小化监测残差。其次,通过结构风险最小化、转移不匹配最小化和流形一致性最大化综合保证了CITL的跨域学习能力。此外,还给出了CITL的收敛分析,在理论上确保了迁移学习性能的有效性和网络紧凑性。 最后,本文通过广泛的实验验证了所提出的方法,这些实验使用了一个实际的无人机(UAV)电池数据集进行,该数据集从数十次飞行任务中收集而来。具体而言,与SS-TCA、MMD-LSTM-DA、DDAN、BO-CNN-TL和AS$^3$LSTM方法相比,在状态健康估计方面,CITL分别提高了83.73%、61.15%、28.24%、87.70%和57.34%,使用均方根误差作为评估指标。
https://arxiv.org/abs/2512.08512
With the rise of Large Language Models (LLMs) such as GPT-3, these models exhibit strong generalization capabilities. Through transfer learning techniques such as fine-tuning and prompt tuning, they can be adapted to various downstream tasks with minimal parameter adjustments. This approach is particularly common in the field of Natural Language Processing (NLP). This paper aims to explore the effectiveness of common prompt tuning methods in 3D object detection. We investigate whether a model trained on the large-scale Waymo dataset can serve as a foundation model and adapt to other scenarios within the 3D object detection field. This paper sequentially examines the impact of prompt tokens and prompt generators, and further proposes a Scene-Oriented Prompt Pool (\textbf{SOP$^2$}). We demonstrate the effectiveness of prompt pools in 3D object detection, with the goal of inspiring future researchers to delve deeper into the potential of prompts in the 3D field.
随着大型语言模型(如GPT-3)的兴起,这些模型展现出了强大的泛化能力。通过迁移学习技术(例如微调和提示调整),它们可以在最少参数调整的情况下适应各种下游任务。这种方法在自然语言处理(NLP)领域尤为常见。本文旨在探索常见的提示调整方法在三维物体检测中的有效性。我们研究了一个基于大规模Waymo数据集训练的模型能否作为基础模型,并且能够适应其他场景内的三维物体检测任务。本文依次探讨了提示标记和提示生成器的影响,进一步提出了一种面向场景的提示池(Scene-Oriented Prompt Pool, \textbf{SOP$^2$})。我们展示了在三维物体检测中提示池的有效性,旨在激发未来的研究者深入挖掘提示技术在三维领域的潜在价值。
https://arxiv.org/abs/2512.08223
This study revisits the findings of Carl et al., who evaluated the pre-trained Google Inception-ResNet-v2 model for automated detection of European wild mammal species in camera trap images. To assess the reproducibility and generalizability of their approach, we reimplemented the experiment from scratch using openly available resources and a different dataset consisting of 900 images spanning 90 species. After minimal preprocessing, we obtained an overall classification accuracy of 62%, closely aligning with the 71% reported in the original work despite differences in datasets. As in the original study, per-class performance varied substantially, as indicated by a macro F1 score of 0.28,highlighting limitations in generalization when labels do not align directly with ImageNet classes. Our results confirm that pretrained convolutional neural networks can provide a practical baseline for wildlife species identification but also reinforce the need for species-specific adaptation or transfer learning to achieve consistent, high-quality predictions.
这项研究重新审视了Carl等人对Google的Inception-ResNet-v2预训练模型在自动检测欧洲野生哺乳动物种类于相机陷阱图像中的应用的研究成果。为了评估其方法的可重复性和泛化能力,我们从零开始使用公开资源和一个包含900张图片、涵盖90个物种的新数据集重新实施了该实验。经过最少的数据预处理后,我们得到了62%的整体分类准确率,这一结果与原研究报道的71%非常接近,尽管所用的数据集有所不同。正如原研究中所述,每个类别的性能差异显著,这一点通过宏观F1分数为0.28得以体现,这表明当标签不能直接对应于ImageNet类别时,模型在泛化上存在局限性。 我们的结果证实了预训练的卷积神经网络可以作为野生动物物种识别的实际基准,但也强调了为了获得一致且高质量的预测,需要进行特定物种的适应或迁移学习。
https://arxiv.org/abs/2512.07305
Pre-trained Vision-Language Models (VLMs), \textit{e.g.} CLIP, have become essential tools in multimodal transfer learning. However, fine-tuning VLMs in few-shot scenarios poses significant challenges in balancing task-specific adaptation and generalization in the obtained model. Meanwhile, current researches have predominantly focused on prompt-based adaptation methods, leaving adapter-based approaches underexplored and revealing notable performance gaps. To address these challenges, we introduce a novel Reconstruction-based Multimodal Adapter (RMAdapter), which leverages a dual-branch architecture. Unlike conventional single-branch adapters, RMAdapter consists of: (1) an adaptation branch that injects task-specific knowledge through parameter-efficient fine-tuning, and (2) a reconstruction branch that preserves general knowledge by reconstructing latent space features back into the original feature space. This design facilitates a dynamic balance between general and task-specific knowledge. Importantly, although RMAdapter introduces an additional reconstruction branch, it is carefully optimized to remain lightweight. By computing reconstruction loss locally at each layer and sharing projection modules, the overall computational overhead is kept minimal. A consistency constraint is also incorporated to better regulate the trade-off between discriminability and generalization. We comprehensively evaluate the effectiveness of RMAdapter on three representative tasks: generalization to new categories, generalization to new target datasets, and domain generalization. Without relying on data augmentation or duplicate prompt designs, our RMAdapter consistently outperforms state-of-the-art approaches across all evaluation metrics.
预训练的视觉-语言模型(VLMs),例如CLIP,已成为多模态迁移学习中的重要工具。然而,在少量样本场景中微调VLM面临着在特定任务适应和泛化能力之间保持平衡的巨大挑战。目前的研究主要集中在基于提示的方法上,而适配器(adapter)方法却未得到充分探索,并且表现出显著的性能差距。为了解决这些挑战,我们引入了一种新颖的基于重构的多模态适配器(RMAdapter),它利用了双分支架构。不同于传统的单分支适配器,RMAdapter由以下两部分组成:(1) 一个通过参数高效的微调注入特定任务知识的适应分支;(2) 一个通过将潜在空间特征重构回原始特征空间来保留通用知识的重构分支。这种设计有助于在通用性和特定任务知识之间动态平衡。值得注意的是,尽管RMAdapter引入了额外的重构分支,但其经过精心优化以保持轻量化。通过在每一层计算本地重构损失并共享投影模块,整体计算开销被最小化。此外还融入了一致性约束来更好地调节判别能力与泛化的折衷。 我们对三个代表性任务进行了全面评估:新类别的泛化、新的目标数据集的泛化以及领域泛化。在不依赖于数据增强或重复提示设计的情况下,我们的RMAdapter在所有评价指标上始终优于当前最先进的方法。
https://arxiv.org/abs/2512.06811
Transfer Learning (TL) has accelerated the rapid development and availability of large language models (LLMs) for mainstream natural language processing (NLP) use cases. However, training and deploying such gigantic LLMs in resource-constrained, real-world healthcare situations remains challenging. This study addresses the limited support available to visually impaired users and speakers of low-resource languages such as Hindi who require medical assistance in rural environments. We propose PDFTEMRA (Performant Distilled Frequency Transformer Ensemble Model with Random Activations), a compact transformer-based architecture that integrates model distillation, frequency-domain modulation, ensemble learning, and randomized activation patterns to reduce computational cost while preserving language understanding performance. The model is trained and evaluated on medical question-answering and consultation datasets tailored to Hindi and accessibility scenarios, and its performance is compared against standard NLP state-of-the-art model baselines. Results demonstrate that PDFTEMRA achieves comparable performance with substantially lower computational requirements, indicating its suitability for accessible, inclusive, low-resource medical NLP applications.
迁移学习(TL)加速了大型语言模型(LLM)在主流自然语言处理(NLP)应用场景中的快速发展和应用。然而,在资源受限的实际医疗环境中训练和部署此类大规模LLM仍然具有挑战性。本研究关注视障用户和支持低资源语言如印地语的说话者在农村环境下的医疗服务需求问题,这些人群需要医疗帮助但面临着较大的使用障碍。我们提出了PDFTEMRA(高性能蒸馏频域变换器集成模型与随机激活),这是一种基于变压器架构的小型化设计,集成了模型蒸馏、频域调制、集成学习和随机激活模式等技术,以降低计算成本并保持语言理解性能。该模型在针对印地语及可访问性场景定制的医疗问答和咨询数据集上进行训练和评估,并与标准NLP最新基准模型进行了性能对比。结果表明,PDFTEMRA实现了与现有最佳模型相当的性能水平,但计算需求显著降低,这证明了其适用于包容性强、资源有限的医学NLP应用。
https://arxiv.org/abs/2512.06734
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition whose rising prevalence places increasing demands on a lengthy diagnostic process. Machine learning (ML) has shown promise in automating ASD diagnosis, but most existing models operate as black boxes and are typically trained on a single dataset, limiting their generalizability. In this study, we introduce a transparent and interpretable ML approach that leverages BioBERT, a state-of-the-art language model, to analyze unstructured clinical text. The model is trained to label descriptions of behaviors and map them to diagnostic criteria, which are then used to assign a final label (ASD or not). We evaluate transfer learning, the ability to transfer knowledge to new data, using two distinct real-world datasets. We trained on datasets sequentially and mixed together and compared the performance of the best models and their ability to transfer to new data. We also created a black-box approach and repeated this transfer process for comparison. Our transparent model demonstrated robust performance, with the mixed-data training strategy yielding the best results (97 % sensitivity, 98 % specificity). Sequential training across datasets led to a slight drop in performance, highlighting the importance of training data order. The black-box model performed worse (90 % sensitivity, 96 % specificity) when trained sequentially or with mixed data. Overall, our transparent approach outperformed the black-box approach. Mixing datasets during training resulted in slightly better performance and should be the preferred approach when practically possible. This work paves the way for more trustworthy, generalizable, and clinically actionable AI tools in neurodevelopmental diagnostics.
自闭症谱系障碍(ASD)是一种复杂的神经发育状况,其发病率的上升给诊断流程带来了更大的压力。机器学习(ML)在自动化ASD诊断方面显示出潜力,但大多数现有模型都属于黑箱操作,并且通常仅在一个数据集上进行训练,这限制了它们的泛化能力。在这项研究中,我们介绍了一种透明且可解释的机器学习方法,该方法利用BioBERT这一最先进的语言模型来分析非结构化的临床文本。通过训练模型将行为描述与诊断标准相匹配,并以此为基础确定最终标签(ASD或否),我们的模型能够进行此操作。 为了评估知识转移的能力,即向新数据迁移知识的能力,我们使用了两个不同的现实世界数据集进行了测试。我们顺序地和混合训练数据集来训练这些模型,并比较最佳模型在将知识转移到新数据时的性能。此外,我们还创建了一个黑箱方法并重复了这一过程以进行对比。 我们的透明模型展示了稳健的表现,在混合数据训练策略下表现最好(97% 的灵敏度、98% 的特异性)。跨数据集顺序训练导致性能略有下降,这突显了训练数据顺序的重要性。相比之下,黑箱模型在顺序训练或混合训练数据时表现较差(90% 的灵敏度、96% 的特异性)。 总体而言,我们的透明方法优于黑盒方法。在培训期间混合数据可以略微提高性能,在实践中可行的情况下应优先采用这种方法。这项工作为神经发育诊断领域开发出更多值得信赖的、具有泛化能力的和临床适用的人工智能工具铺平了道路。
https://arxiv.org/abs/2512.06161
With the rapid advancement of technology, 3D data acquisition and utilization have become increasingly prevalent across various fields, including computer vision, robotics, and geospatial analysis. 3D data, captured through methods such as 3D scanners, LiDARs, and RGB-D cameras, provides rich geometric, shape, and scale information. When combined with 2D images, 3D data offers machines a comprehensive understanding of their environment, benefiting applications like autonomous driving, robotics, remote sensing, and medical treatment. This dissertation focuses on three main areas: supervised representation learning for point cloud primitive segmentation, self-supervised learning methods, and transfer learning from 2D to 3D. Our approach, which integrates pre-trained 2D models to support 3D network training, significantly improves 3D understanding without merely transforming 2D data. Extensive experiments validate the effectiveness of our methods, showcasing their potential to advance point cloud representation learning by effectively integrating 2D knowledge.
随着技术的迅速发展,三维数据采集和应用在计算机视觉、机器人技术和地理空间分析等各个领域变得越来越普遍。通过3D扫描仪、LiDAR(激光雷达)以及RGB-D相机捕捉到的三维数据提供了丰富的几何形状、形态及尺度信息。当与二维图像结合时,三维数据能够为机器提供对其环境的整体理解,从而在自动驾驶、机器人技术、遥感和医疗治疗等领域发挥重要作用。 本论文主要关注三个领域:点云基础分割中的监督表示学习、自我监督学习方法以及从2D到3D的迁移学习。我们的研究方法整合了预训练的二维模型来支持三维网络的训练,在不单纯转换二维数据的情况下,显著提升了对三维环境的理解能力。广泛的实验验证了我们所提出的方法的有效性,并展示了其在有效融合二维知识以推进点云表示学习方面的巨大潜力。
https://arxiv.org/abs/2512.06058
This study focuses on event detection in optical fibers, specifically classifying six events using the Phase-OTDR system. A novel approach is introduced to enhance Phase-OTDR data analysis by transforming 1D data into grayscale images through techniques such as Gramian Angular Difference Field, Gramian Angular Summation Field, and Recurrence Plot. These grayscale images are combined into a multi-channel RGB representation, enabling more robust and adaptable analysis using transfer learning models. The proposed methodology achieves high classification accuracies of 98.84% and 98.24% with the EfficientNetB0 and DenseNet121 models, respectively. A 5-fold cross-validation process confirms the reliability of these models, with test accuracy rates of 99.07% and 98.68%. Using a publicly available Phase-OTDR dataset, the study demonstrates an efficient approach to understanding optical fiber events while reducing dataset size and improving analysis efficiency. The results highlight the transformative potential of image-based analysis in interpreting complex fiber optic sensing data, offering significant advancements in the accuracy and reliability of fiber optic monitoring systems. The codes and the corresponding image-based dataset are made publicly available on GitHub to support further research: this https URL.
这项研究专注于基于光纤的事件检测,特别使用相位-OTDR(Phase-Optical Time Domain Reflectometer)系统对六种不同的事件进行分类。提出了一种新颖的方法来增强相位-OTDR数据的分析能力,通过运用Gramian角度差异场、Gramian角度和场以及递归图等技术将1D数据转换为灰度图像。这些灰度图像被组合成多通道RGB表示形式,从而能够利用迁移学习模型进行更强大且适应性更强的分析。 所提出的这种方法在使用EfficientNetB0和DenseNet121两种模型时分别实现了高达98.84%和98.24%的高分类准确率。通过五倍交叉验证过程确认了这些模型的可靠性,测试准确性分别为99.07%和98.68%。 利用一个公开可用的相位-OTDR数据集,该研究展示了理解光纤事件的有效方法,并且在减少数据量的同时提高了分析效率。研究结果突显了基于图像分析技术在解释复杂光纤传感数据方面的变革潜力,为光纤监测系统的准确性和可靠性提供了显著的进步。 为了支持进一步的研究,在GitHub上公开了代码和相应的基于图像的数据集:[此链接](this https URL)。
https://arxiv.org/abs/2512.05830
Medical image classification plays an increasingly vital role in identifying various diseases by classifying medical images, such as X-rays, MRIs and CT scans, into different categories based on their features. In recent years, deep learning techniques have attracted significant attention in medical image classification. However, it is usually infeasible to train an entire large deep learning model from scratch. To address this issue, one of the solutions is the transfer learning (TL) technique, where a pre-trained model is reused for a new task. In this paper, we present a comprehensive analysis of TL techniques for medical image classification using deep convolutional neural networks. We evaluate six pre-trained models (AlexNet, VGG16, ResNet18, ResNet34, ResNet50, and InceptionV3) on a custom chest X-ray dataset for disease detection. The experimental results demonstrate that InceptionV3 consistently outperforms other models across all the standard metrics. The ResNet family shows progressively better performance with increasing depth, whereas VGG16 and AlexNet perform reasonably well but with lower accuracy. In addition, we also conduct uncertainty analysis and runtime comparison to assess the robustness and computational efficiency of these models. Our findings reveal that TL is beneficial in most cases, especially with limited data, but the extent of improvement depends on several factors such as model architecture, dataset size, and domain similarity between source and target tasks. Moreover, we demonstrate that with a well-trained feature extractor, only a lightweight feedforward model is enough to provide efficient prediction. As such, this study contributes to the understanding of TL in medical image classification, and provides insights for selecting appropriate models based on specific requirements.
医学图像分类在通过将X光、MRI和CT扫描等医疗影像根据其特征归类到不同的类别来识别各种疾病方面扮演着越来越重要的角色。近年来,深度学习技术在医学图像分类中引起了广泛关注。然而,通常从头开始训练整个大型深度学习模型是不切实际的。为了解决这个问题,解决方案之一就是迁移学习(Transfer Learning, TL)技术,在该技术中预训练好的模型被复用于新的任务上。本文我们对基于深度卷积神经网络的医学图像分类中的迁移学习方法进行了全面分析,并在自定义胸部X光数据集上评估了六种预训练模型(AlexNet、VGG16、ResNet18、ResNet34、ResNet50和InceptionV3)用于疾病检测的效果。实验结果表明,InceptionV3在整个标准评价指标中始终优于其他模型。随着深度的增加,ResNet家族的表现逐渐变好;而VGG16和AlexNet表现得相当不错但准确性较低。此外,我们还进行了不确定性分析和运行时间比较,以评估这些模型的鲁棒性和计算效率。我们的发现表明,在大多数情况下迁移学习是有益的,尤其是在数据有限的情况下,但是改进的程度依赖于诸如模型架构、数据集大小以及源任务与目标任务之间的领域相似性等若干因素。此外,我们还证明了即使只有轻量级前馈模型也足够提供高效的预测,前提是特征提取器经过良好训练。因此,这项研究有助于理解医学图像分类中的迁移学习,并为根据特定需求选择合适的模型提供了见解。
https://arxiv.org/abs/2512.04397
Foreign Object Debris (FOD) within aircraft fuel tanks presents critical safety hazards including fuel contamination, system malfunctions, and increased maintenance costs. Despite the severity of these risks, there is a notable lack of dedicated datasets for the complex, enclosed environments found inside fuel tanks. To bridge this gap, we present a novel dataset, FOD-S2R, composed of real and synthetic images of the FOD within a simulated aircraft fuel tank. Unlike existing datasets that focus on external or open-air environments, our dataset is the first to systematically evaluate the effectiveness of synthetic data in enhancing the real-world FOD detection performance in confined, closed structures. The real-world subset consists of 3,114 high-resolution HD images captured in a controlled fuel tank replica, while the synthetic subset includes 3,137 images generated using Unreal Engine. The dataset is composed of various Field of views (FOV), object distances, lighting conditions, color, and object size. Prior research has demonstrated that synthetic data can reduce reliance on extensive real-world annotations and improve the generalizability of vision models. Thus, we benchmark several state-of-the-art object detection models and demonstrate that introducing synthetic data improves the detection accuracy and generalization to real-world conditions. These experiments demonstrate the effectiveness of synthetic data in enhancing the model performance and narrowing the Sim2Real gap, providing a valuable foundation for developing automated FOD detection systems for aviation maintenance.
飞机燃油箱内的外来物(Foreign Object Debris,FOD)会带来重大的安全风险,包括燃料污染、系统故障以及增加的维护成本。尽管这些风险非常严重,但专门针对封闭环境中复杂环境的数据集仍然缺乏。为解决这一问题,我们提出了一项全新的数据集——FOD-S2R,该数据集由真实和合成图像组成,展示了模拟飞机燃油箱内部的FOD情况。与现有的专注于外部或开阔环境的数据集不同,我们的数据集首次系统地评估了合成数据在增强封闭空间中实际FOD检测性能方面的有效性。 具体来说,现实世界的子集包含3,114张高分辨率HD图像,这些图像是在一个受控的燃油箱复制品环境中拍摄得到的。而合成数据子集则包括使用Unreal Engine生成的3,137张图像。该数据集涵盖了各种视场(Field of View)、物体距离、光照条件、颜色和物体尺寸。 先前的研究表明,合成数据可以减少对大量实际世界标注的依赖,并提高视觉模型的泛化能力。因此,我们对几种最先进的目标检测模型进行了基准测试,并展示了引入合成数据能够提升检测准确性和在真实环境中的泛化性能。这些实验证明了合成数据在增强模型性能和缩小仿真到现实(Sim2Real)差距方面是有效的,为开发用于航空维护的自动化FOD检测系统提供了宝贵的基石。
https://arxiv.org/abs/2512.01315
We present a transfer-learning generative downscaling framework to reconstruct fine resolution satellite images from coarse scale inputs. Our approach combines a lightweight U-Net transfer encoder with a diffusion-based generative model. The simpler U-Net is first pretrained on a long time series of coarse resolution data to learn spatiotemporal representations; its encoder is then frozen and transferred to a larger downscaling model as physically meaningful latent features. Our application uses NASA's MERRA-2 reanalysis as the low resolution source domain (50 km) and the GEOS-5 Nature Run (G5NR) as the high resolution target (7 km). Our study area included a large area in Asia, which was made computationally tractable by splitting into two subregions and four seasons. We conducted domain similarity analysis using Wasserstein distances confirmed minimal distributional shift between MERRA-2 and G5NR, validating the safety of parameter frozen transfer. Across seasonal regional splits, our model achieved excellent performance (R2 = 0.65 to 0.94), outperforming comparison models including deterministic U-Nets, variational autoencoders, and prior transfer learning baselines. Out of data evaluations using semivariograms, ACF/PACF, and lag-based RMSE/R2 demonstrated that the predicted downscaled images preserved physically consistent spatial variability and temporal autocorrelation, enabling stable autoregressive reconstruction beyond the G5NR record. These results show that transfer enhanced diffusion models provide a robust and physically coherent solution for downscaling a long time series of coarse resolution images with limited training periods. This advancement has significant implications for improving environmental exposure assessment and long term environmental monitoring.
我们提出了一种基于迁移学习的生成式降尺度框架,用于从粗分辨率输入重建高分辨率卫星图像。我们的方法结合了一个轻量级的U-Net迁移编码器和一个扩散模型为基础的生成模型。首先,在长时间序列的低分辨率数据上预训练简单的U-Net,以学习时空表示;随后冻结其编码器,并将其转移到更大的降尺度模型中作为具有物理意义的潜在特征。我们的应用使用了NASA的MERRA-2再分析(50公里分辨率)作为低分辨率源域和GEOS-5自然运行(G5NR,7公里分辨率)作为高分辨率目标域。研究区域涵盖亚洲的一个大范围地区,为了使计算可行,将其划分为两个子区域,并按四个季节进行处理。 我们使用Wasserstein距离进行了领域相似性分析,确认了MERRA-2和G5NR之间的分布变化很小,验证了参数冻结迁移的安全性。在跨季节和区域性分割上,我们的模型取得了卓越的性能(R² = 0.65到0.94),优于包括确定性的U-Nets、变分自编码器以及之前的迁移学习基准在内的对比模型。 通过半方差图、ACF/PACF及基于滞后误差(RMSE/R²)的外部数据评估,表明预测降尺度图像保留了物理一致的空间变异性和时间自相关性,从而能够实现稳定的自回归重构,超出G5NR记录范围。这些结果表明,迁移增强型扩散模型为长时间序列低分辨率影像降尺度提供了一种稳健且具有物理连贯性的解决方案,在有限训练周期内效果显著。 这项进展对改善环境暴露评估和长期环境监测有着重要的影响。
https://arxiv.org/abs/2512.05139
This study presents a comprehensive deep learning pipeline for the automated classification of 12 foraminifera species using 2D micro-CT slices derived from 3D scans. We curated a scientifically rigorous dataset comprising 97 micro-CT scanned specimens across 27 species, selecting 12 species with sufficient representation for robust machine learning. To ensure methodological integrity and prevent data leakage, we employed specimen-level data splitting, resulting in 109,617 high-quality 2D slices (44,103 for training, 14,046 for validation, and 51,468 for testing). We evaluated seven state-of-the-art 2D convolutional neural network (CNN) architectures using transfer learning. Our final ensemble model, combining ConvNeXt-Large and EfficientNetV2-Small, achieved a test accuracy of 95.64%, with a top-3 accuracy of 99.6% and an area under the ROC curve (AUC) of 0.998 across all species. To facilitate practical deployment, we developed an interactive advanced dashboard that supports real-time slice classification and 3D slice matching using advanced similarity metrics, including SSIM, NCC, and the Dice coefficient. This work establishes new benchmarks for AI-assisted micropaleontological identification and provides a fully reproducible framework for foraminifera classification research, bridging the gap between deep learning and applied geosciences.
这项研究提出了一种全面的深度学习管道,用于使用从3D扫描中提取的2D微CT切片自动分类12个有孔虫物种。我们整理了一个科学严谨的数据集,包含来自27个物种的97个微CT扫描样本,并选择了具有足够代表性的12个物种来进行稳健的机器学习研究。为了确保方法论上的完整性并防止数据泄露,我们在标本级别进行了数据分割,最终产生了109,617张高质量的2D切片(训练集44,103张、验证集14,046张和测试集51,468张)。 我们使用迁移学习评估了七种最先进的二维卷积神经网络(CNN)架构。我们的最终集成模型结合了ConvNeXt-Large和EfficientNetV2-Small,在所有物种上实现了95.64%的测试准确率,top-3准确率为99.6%,以及0.998的ROC曲线下的面积(AUC)。 为了便于实际部署,我们开发了一个交互式高级仪表板,支持实时切片分类和使用SSIM(结构相似性指数)、NCC(归一化互相关系数)及Dice系数等高级相似度指标进行3D切片匹配。这项工作为AI辅助的微古生物学鉴定设定了新的基准,并提供了一个完全可重复的研究框架,将深度学习与应用地球科学相结合。
https://arxiv.org/abs/2512.00912
In traditional EDA flows, layout-level performance metrics are only obtainable after placement and routing, hindering global optimization at earlier stages. Although some neural-network-based solutions predict layout-level performance directly from netlists, they often face generalization challenges due to the black-box heuristics of commercial placement-and-routing tools, which create disparate data across designs. To this end, we propose ParaGate, a three-step cross-stage prediction framework that infers layout-level timing and power from netlists. First, we propose a two-phase transfer-learning approach to predict parasitic parameters, pre-training on mid-scale circuits and fine-tuning on larger ones to capture extreme conditions. Next, we rely on EDA tools for timing analysis, offloading the long-path numerical reasoning. Finally, ParaGate performs global calibration using subgraph features. Experiments show that ParaGate achieves strong generalization with minimal fine-tuning data: on openE906, its arrival-time R2 from 0.119 to 0.897. These results demonstrate that ParaGate could provide guidance for global optimization in the synthesis and placement stages.
https://arxiv.org/abs/2511.23340
Multi-aspect sentiment analysis of Bangla e-commerce reviews remains challenging due to limited annotated datasets, morphological complexity, code-mixing phenomena, and domain shift issues, affecting 300 million Bangla-speaking users. Existing approaches lack explainability and cross-domain generalization capabilities crucial for practical deployment. We present BanglaSentNet, an explainable hybrid deep learning framework integrating LSTM, BiLSTM, GRU, and BanglaBERT through dynamic weighted ensemble learning for multi-aspect sentiment classification. We introduce a dataset of 8,755 manually annotated Bangla product reviews across four aspects (Quality, Service, Price, Decoration) from major Bangladeshi e-commerce platforms. Our framework incorporates SHAP-based feature attribution and attention visualization for transparent insights. BanglaSentNet achieves 85% accuracy and 0.88 F1-score, outperforming standalone deep learning models by 3-7% and traditional approaches substantially. The explainability suite achieves 9.4/10 interpretability score with 87.6% human agreement. Cross-domain transfer learning experiments reveal robust generalization: zero-shot performance retains 67-76% effectiveness across diverse domains (BanglaBook reviews, social media, general e-commerce, news headlines); few-shot learning with 500-1000 samples achieves 90-95% of full fine-tuning performance, significantly reducing annotation costs. Real-world deployment demonstrates practical utility for Bangladeshi e-commerce platforms, enabling data-driven decision-making for pricing optimization, service improvement, and customer experience enhancement. This research establishes a new state-of-the-art benchmark for Bangla sentiment analysis, advances ensemble learning methodologies for low-resource languages, and provides actionable solutions for commercial applications.
https://arxiv.org/abs/2511.23264
Software Requirement Document (RD) typically contain tens of thousands of individual requirements, and ensuring consistency among these requirements is critical for the success of software engineering projects. Automated detection methods can significantly enhance efficiency and reduce costs; however, existing approaches still face several challenges, including low detection accuracy on imbalanced data, limited semantic extraction due to the use of a single encoder, and suboptimal performance in cross-domain transfer learning. To address these issues, this paper proposes a Transferable Software Requirement Conflict Detection Framework based on SBERT and SimCSE, termed TSRCDF-SS. First, the framework employs two independent encoders, Sentence-BERT (SBERT) and Simple Contrastive Sentence Embedding (SimCSE), to generate sentence embeddings for requirement pairs, followed by a six-element concatenation strategy. Furthermore, the classifier is enhanced by a two-layer fully connected feedforward neural network (FFNN) with a hybrid loss optimization strategy that integrates a variant of Focal Loss, domain-specific constraints, and a confidence-based penalty term. Finally, the framework synergistically integrates sequential and cross-domain transfer learning. Experimental results demonstrate that the proposed framework achieves a 10.4% improvement in both macro-F1 and weighted-F1 scores in in-domain settings, and an 11.4% increase in macro-F1 in cross-domain scenarios.
https://arxiv.org/abs/2511.23007