Imaging sites around the world generate growing amounts of medical scan data with ever more versatile and affordable technology. Large-scale studies acquire MRI for tens of thousands of participants, together with metadata ranging from lifestyle questionnaires to biochemical assays, genetic analyses and more. These large datasets encode substantial information about human health and hold considerable potential for machine learning training and analysis. This chapter examines ongoing large-scale studies and the challenge of distribution shifts between them. Transfer learning for overcoming such shifts is discussed, together with federated learning for safe access to distributed training data securely held at multiple institutions. Finally, representation learning is reviewed as a methodology for encoding embeddings that express abstract relationships in multi-modal input formats.
在世界各地的成像站点生成越来越多的医疗扫描数据,并且使用越来越多样化和经济实惠的技术。大型研究项目可以获取数十万参与者的MRI数据,以及从生活方式问卷到生物化学检测、遗传分析等元数据。这些大型数据集编码了有关人类健康的大量信息,具有很大的机器学习训练和分析潜力。本章审查了正在进行的大型研究项目和它们之间的分布转移挑战。讨论了转移学习来克服这种转移,以及分散式学习安全地存储在多个机构上的分布式训练数据的访问。最后,对表示学习作为表示多模态输入格式中抽象关系的一种方法进行了回顾。
https://arxiv.org/abs/2404.14326
We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a college chemistry course, featuring real student responses to long-answer questions with an average word count notably higher than typical ASAG datasets. We propose a novel approach to ALAG by formulating it as a rubric entailment problem, employing natural language inference models to verify whether each criterion, represented by a rubric item, is addressed in the student's response. This formulation enables the effective use of MNLI for transfer learning, significantly improving the performance of models on the RiceChem dataset. We demonstrate the importance of rubric-based formulation in ALAG, showcasing its superiority over traditional score-based approaches in capturing the nuances of student responses. We also investigate the performance of models in cold start scenarios, providing valuable insights into the practical deployment considerations in educational settings. Lastly, we benchmark state-of-the-art open-sourced Large Language Models (LLMs) on RiceChem and compare their results to GPT models, highlighting the increased complexity of ALAG compared to ASAG. Despite leveraging the benefits of a rubric-based approach and transfer learning from MNLI, the lower performance of LLMs on RiceChem underscores the significant difficulty posed by the ALAG task. With this work, we offer a fresh perspective on grading long, fact-based answers and introduce a new dataset to stimulate further research in this important area. Code: \url{this https URL}.
我们在教育自然语言处理领域引入了一个新的研究领域:自动长答案评分(ALAG)。与自动短答案评分(ASAG)和自动论文评分(AEG)不同,ALAG因为基于事实的长答案的复杂性和多面性而面临着独特的挑战。为了研究ALAG,我们引入了 RiceChem 数据集,这是一个来源于大学化学课程的数据集,其中真实学生对长答案问题的回答平均单词数明显高于典型的ASAG数据集。我们通过将ALAG公式化为一个评分表约束问题,并使用自然语言推理模型来验证每个评分表项目是否在学生回答中得到解决,从而提出了一种新颖的ALAG方法。这一方法使得MNLI在迁移学习中有更好的效果,显著提高了在RiceChem数据集上的模型性能。我们展示了基于评分表公式的ALAG在ALAG中的重要性,并探讨了在学生反应中捕捉细微差别的效果。最后,我们在RiceChem上 benchmark了最先进的开源大型语言模型(LLMs),并将它们的结果与GPT模型进行比较,突出了ALAG相对于ASAG的增加复杂性。尽管利用了基于评分表的方法和MNLI的迁移学习优势,但LLMs在RiceChem上的表现仍然较低,这表明ALAG任务所提出的困难程度。通过这项工作,我们提供了一个对评分长、基于事实的答案的新视角,并引入了一个新的数据集,以激发进一步研究这个重要领域的兴趣。代码:\url{这个 <https://this <https://this URL>.
https://arxiv.org/abs/2404.14316
Recent advances in generative visual models and neural radiance fields have greatly boosted 3D-aware image synthesis and stylization tasks. However, previous NeRF-based work is limited to single scene stylization, training a model to generate 3D-aware cartoon faces with arbitrary styles remains unsolved. We propose ArtNeRF, a novel face stylization framework derived from 3D-aware GAN to tackle this problem. In this framework, we utilize an expressive generator to synthesize stylized faces and a triple-branch discriminator module to improve the visual quality and style consistency of the generated faces. Specifically, a style encoder based on contrastive learning is leveraged to extract robust low-dimensional embeddings of style images, empowering the generator with the knowledge of various styles. To smooth the training process of cross-domain transfer learning, we propose an adaptive style blending module which helps inject style information and allows users to freely tune the level of stylization. We further introduce a neural rendering module to achieve efficient real-time rendering of images with higher resolutions. Extensive experiments demonstrate that ArtNeRF is versatile in generating high-quality 3D-aware cartoon faces with arbitrary styles.
近年来,在生成视觉模型和神经辐射场方面取得了显著的进展,极大地推动了3D感知图像合成和风格化任务的发展。然而,先前的基于NeRF的工作仅限于单场景风格化,将模型训练为生成任意风格的三维卡通面部仍然是一个未解决的问题。我们提出了ArtNeRF,一种基于3D感知GAN的新颖面部风格化框架,以解决这个问题。在这个框架中,我们利用具有表现力的生成器合成风格化的面部,并采用三重分支的判别器模块来提高生成的面的视觉质量和风格一致性。具体来说,我们基于对比学习的方法提出了一个风格编码器,提取出风格图的低维度嵌入,使得生成器获得各种风格的知识。为了平滑跨域迁移学习的训练过程,我们提出了一个自适应风格融合模块,有助于注入风格信息,并允许用户自由调整风格水平。我们还引入了神经渲染模块,以实现高分辨率图像的实时渲染。大量的实验结果表明,ArtNeRF在生成具有任意风格的高质量3D卡通面部方面具有多样性。
https://arxiv.org/abs/2404.13711
Federated Learning (FL) has emerged as a prominent privacy-preserving technique for enabling use cases like confidential clinical machine learning. FL operates by aggregating models trained by remote devices which owns the data. Thus, FL enables the training of powerful global models using crowd-sourced data from a large number of learners, without compromising their privacy. However, the aggregating server is a single point of failure when generating the global model. Moreover, the performance of the model suffers when the data is not independent and identically distributed (non-IID data) on all remote devices. This leads to vastly different models being aggregated, which can reduce the performance by as much as 50% in certain scenarios. In this paper, we seek to address the aforementioned issues while retaining the benefits of FL. We propose MultiConfederated Learning: a decentralized FL framework which is designed to handle non-IID data. Unlike traditional FL, MultiConfederated Learning will maintain multiple models in parallel (instead of a single global model) to help with convergence when the data is non-IID. With the help of transfer learning, learners can converge to fewer models. In order to increase adaptability, learners are allowed to choose which updates to aggregate from their peers.
联邦学习(FL)已成为一种突出隐私保护的技术,用于实现诸如机密临床机器学习等隐私 preserving 用例。FL通过汇总由远程设备训练的模型来操作,这些设备拥有数据。因此,FL允许大规模学习者使用 crowd-sourced 数据训练强大的全局模型,同时不损害他们的隐私。然而,在生成全局模型时,聚合服务器是一个单点故障。此外,当数据不独立且分布不同时(非 IID 数据)在所有远程设备上训练模型时,模型的性能会受到影响。这导致聚合的不同模型,在某些场景下可能导致性能降低 50%。在本文中,我们试图解决上述问题,同时保留 FL 的优点。我们提出了 MultiConfederated Learning:一个设计用于处理非 IID 数据的联邦 FL 框架。与传统 FL 不同,MultiConfederated Learning 将多个模型并行(而不是单全球模型)以帮助数据为非 IID 时达到收敛。通过迁移学习,学习者可以 convergence 到更少的模型。为了增加适应性,学习者被允许从同伴中选择要聚合的更新。
https://arxiv.org/abs/2404.13421
Ultrasonic metal welding (UMW) is a key joining technology with widespread industrial applications. Condition monitoring (CM) capabilities are critically needed in UMW applications because process anomalies significantly deteriorate the joining quality. Recently, machine learning models emerged as a promising tool for CM in many manufacturing applications due to their ability to learn complex patterns. Yet, the successful deployment of these models requires substantial training data that may be expensive and time-consuming to collect. Additionally, many existing machine learning models lack generalizability and cannot be directly applied to new process configurations (i.e., domains). Such issues may be potentially alleviated by pooling data across manufacturers, but data sharing raises critical data privacy concerns. To address these challenges, this paper presents a Federated Transfer Learning with Task Personalization (FTL-TP) framework that provides domain generalization capabilities in distributed learning while ensuring data privacy. By effectively learning a unified representation from feature space, FTL-TP can adapt CM models for clients working on similar tasks, thereby enhancing their overall adaptability and performance jointly. To demonstrate the effectiveness of FTL-TP, we investigate two distinct UMW CM tasks, tool condition monitoring and workpiece surface condition classification. Compared with state-of-the-art FL algorithms, FTL-TP achieves a 5.35%--8.08% improvement of accuracy in CM in new target domains. FTL-TP is also shown to perform excellently in challenging scenarios involving unbalanced data distributions and limited client fractions. Furthermore, by implementing the FTL-TP method on an edge-cloud architecture, we show that this method is both viable and efficient in practice. The FTL-TP framework is readily extensible to various other manufacturing applications.
超声金属焊接(UMW)是一种广泛应用于工业领域的连接技术。在UMW应用中,状态监测(CM)功能至关重要,因为过程异常会显著恶化连接质量。最近,机器学习模型在许多制造应用中显示出成为有前景的工具,因为它们能够学习复杂的模式。然而,这些模型的成功部署需要大量的训练数据,这可能昂贵且耗时。此外,许多现有机器学习模型缺乏泛化能力,不能直接应用于新的过程配置(即领域)。这些问题可能通过汇集制造商的数据得到缓解,但数据共享会引发关键的数据隐私问题。为了应对这些挑战,本文提出了一个分散学习任务个性化(FTL-TP)框架,在分布式学习过程中提供领域泛化能力,同时确保数据隐私。通过从特征空间有效地学习统一表示,FTL-TP可以适应在类似任务上工作的客户端CM模型,从而提高它们的整体适应性和性能。为了证明FTL-TP的有效性,我们研究了两个不同的UMW CM任务,即工具状况监测和工件表面状况分类。与最先进的FL算法相比,FTL-TP在CM在新目标域中的准确度提高了5.35%--8.08%。FTL-TP在涉及不平衡数据分布和有限客户端份额的具有挑战性的场景中也表现出色。此外,通过在边缘云架构上实现FTL-TP方法,我们证明了这种方法在实践中是可行且高效的。FTL-TP框架易于扩展到各种其他制造应用。
https://arxiv.org/abs/2404.13278
Artificial intelligence supports healthcare professionals with predictive modeling, greatly transforming clinical decision-making. This study addresses the crucial need for fairness and explainability in AI applications within healthcare to ensure equitable outcomes across diverse patient demographics. By focusing on the predictive modeling of sepsis-related mortality, we propose a method that learns a performance-optimized predictive model and then employs the transfer learning process to produce a model with better fairness. Our method also introduces a novel permutation-based feature importance algorithm aiming at elucidating the contribution of each feature in enhancing fairness on predictions. Unlike existing explainability methods concentrating on explaining feature contribution to predictive performance, our proposed method uniquely bridges the gap in understanding how each feature contributes to fairness. This advancement is pivotal, given sepsis's significant mortality rate and its role in one-third of hospital deaths. Our method not only aids in identifying and mitigating biases within the predictive model but also fosters trust among healthcare stakeholders by improving the transparency and fairness of model predictions, thereby contributing to more equitable and trustworthy healthcare delivery.
人工智能支持医疗专业人员运用预测建模,极大地改善了临床决策。这项研究关注公正性和可解释性在医疗保健应用程序中的重要性,以确保不同患者群体之间实现公正的结局。我们聚焦于脓毒症相关死亡预测,提出了一种学习性能优化预测模型的方法,然后应用迁移学习过程产生更公平的模型。我们还在方法中引入了一种新型的置换基特征重要性算法,旨在阐明每个特征在增强预测公正性方面的贡献。与现有的解释性方法侧重于解释特征对预测性能的贡献不同,我们提出的方法独特地弥合了理解每个特征如何影响公正性的差距。鉴于脓毒症的高死亡率及其在医院死亡中的重要作用,我们方法的进步至关重要。通过改进模型的透明度和公平性,它不仅有助于识别和减轻预测模型中的偏见,而且还有助于培养医疗保健利益相关者之间的信任,从而为更公正、可信赖的医疗保健提供贡献。
https://arxiv.org/abs/2404.13139
Adapter-based parameter-efficient transfer learning has achieved exciting results in vision-language models. Traditional adapter methods often require training or fine-tuning, facing challenges such as insufficient samples or resource limitations. While some methods overcome the need for training by leveraging image modality cache and retrieval, they overlook the text modality's importance and cross-modal cues for the efficient adaptation of parameters in visual-language models. This work introduces a cross-modal parameter-efficient approach named XMAdapter. XMAdapter establishes cache models for both text and image modalities. It then leverages retrieval through visual-language bimodal information to gather clues for inference. By dynamically adjusting the affinity ratio, it achieves cross-modal fusion, decoupling different modal similarities to assess their respective contributions. Additionally, it explores hard samples based on differences in cross-modal affinity and enhances model performance through adaptive adjustment of sample learning intensity. Extensive experimental results on benchmark datasets demonstrate that XMAdapter outperforms previous adapter-based methods significantly regarding accuracy, generalization, and efficiency.
基于适配器的参数高效迁移学习在视觉语言模型上取得了令人兴奋的结果。传统的适配方法通常需要训练或微调,面临着样本不足或资源受限等挑战。虽然一些方法通过利用图像模态缓存和检索来避免训练需求,但它们忽视了文本模态对视觉语言模型参数高效适应的重要性以及跨模态提示。本文介绍了一种跨模态参数高效的称为XMAdapter的方法。XMAdapter为文本和图像模态建立了缓存模型。然后通过利用视觉语言的异构信息进行检索,收集关于推理的线索。通过动态调整亲和度比,实现跨模态融合,解耦不同模态的相似性以评估它们各自的贡献。此外,它基于跨模态亲和度的差异探索 hard 样本,并通过自适应调整样本学习强度来增强模型性能。在基准数据集上进行广泛的实验结果表明,XMAdapter在准确性、泛化能力和效率方面显著优于基于适配器的先前的方法。
https://arxiv.org/abs/2404.12588
Adversarial attacks pose significant challenges to deep neural networks (DNNs) such as Transformer models in natural language processing (NLP). This paper introduces a novel defense strategy, called GenFighter, which enhances adversarial robustness by learning and reasoning on the training classification distribution. GenFighter identifies potentially malicious instances deviating from the distribution, transforms them into semantically equivalent instances aligned with the training data, and employs ensemble techniques for a unified and robust response. By conducting extensive experiments, we show that GenFighter outperforms state-of-the-art defenses in accuracy under attack and attack success rate metrics. Additionally, it requires a high number of queries per attack, making the attack more challenging in real scenarios. The ablation study shows that our approach integrates transfer learning, a generative/evolutive procedure, and an ensemble method, providing an effective defense against NLP adversarial attacks.
对抗性攻击对深度神经网络(DNNs)如Transformer模型在自然语言处理(NLP)中构成了重大挑战。本文提出了一种名为GenFighter的新防御策略,通过在训练分类分布上学习和推理来增强对抗性鲁棒性。GenFighter能够识别出分布中可能存在恶意实例,并将它们转化为与训练数据平行的语义等价实例,并采用集成技术实现统一和鲁棒的攻击响应。通过进行大量实验,我们发现GenFighter在攻击和攻击成功率指标上优于最先进的防御措施。此外,它需要每个攻击很高的查询次数,使得在现实场景中攻击更加具有挑战性。消融研究证明了我们的方法集成了迁移学习、生成/进化过程和集成方法,有效对抗了NLP对抗攻击。
https://arxiv.org/abs/2404.11538
Lung diseases remain a critical global health concern, and it's crucial to have accurate and quick ways to diagnose them. This work focuses on classifying different lung diseases into five groups: viral pneumonia, bacterial pneumonia, COVID, tuberculosis, and normal lungs. Employing advanced deep learning techniques, we explore a diverse range of models including CNN, hybrid models, ensembles, transformers, and Big Transfer. The research encompasses comprehensive methodologies such as hyperparameter tuning, stratified k-fold cross-validation, and transfer learning with fine-tuning.Remarkably, our findings reveal that the Xception model, fine-tuned through 5-fold cross-validation, achieves the highest accuracy of 96.21\%. This success shows that our methods work well in accurately identifying different lung diseases. The exploration of explainable artificial intelligence (XAI) methodologies further enhances our understanding of the decision-making processes employed by these models, contributing to increased trust in their clinical applications.
肺疾病仍然是全球健康的一个关键问题,并且准确和快速诊断它们是非常重要的。这项工作重点对不同的肺疾病进行分类,分为五类:病毒性肺炎、细菌性肺炎、COVID-19、结核病和正常肺。利用先进的深度学习技术,我们探讨了包括CNN、混合模型、元学习、Transformer和Big Transfer在内的各种模型。研究包括全面的方法,如超参数调整、分层k-fold交叉验证和迁移学习中的微调。值得注意的是,我们的研究结果表明,通过5倍交叉验证进行微调的Xception模型具有最高的准确率,达到96.21%。这一成功表明,我们的方法在准确识别不同肺疾病方面非常有效。探索可解释人工智能(XAI)方法进一步增加了我们对这些模型决策过程的理解,有助于提高它们在临床应用中的信任度。
https://arxiv.org/abs/2404.11428
A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in these challenging scenarios without the need to convert non-ideal images into their RGB counterparts. In our methodology, we initially train a comprehensive model on a pristine RGB image dataset. Subsequently, non-ideal images are processed by comparing their feature maps against those from the initial ideal RGB model. This comparison employs the Extended Area Novel Structural Discrepancy Loss (EANSDL), a novel loss function designed to quantify similarities and integrate them into the detection loss. This approach refines the model's ability to perform object detection across varying conditions through direct feature map correction, encapsulating the essence of Feature Corrective Transfer Learning. Experimental validation on variants of the KITTI dataset demonstrates a significant improvement in mean Average Precision (mAP), resulting in a 3.8-8.1% relative enhancement in detection under non-ideal conditions compared to the baseline model, and a less marginal performance difference within 1.3% of the mAP@[0.5:0.95] achieved under ideal conditions by the standard Faster RCNN algorithm.
在物体检测领域,一个重要的挑战是在非理想成像条件下,例如雨、雾、低照度或原始Bayer图像上,系统的性能。我们的研究引入了一种名为“特征纠正传输学习”的新方法,利用传输学习和自定义损失函数来促进在不需要将非理想图像转换为RGB同义品的情况下,端到端检测这些具有挑战性的场景中的物体。在我们的方法中,我们首先在一个纯净的RGB图像数据集上训练一个全面的模型。随后,通过将非理想图像的特征图与初始理想RGB模型的特征图进行比较来进行处理。这个比较采用了一种名为扩展区域新颖结构差异损失(EANSDL)的新损失函数,这是一种专门用于衡量相似性并将其集成到检测损失中的新损失函数。通过直接特征图纠正来优化模型的能力,包容了特征纠正传输学习的精髓。在KITTI数据集中的变体实验验证了这种方法在非理想条件下检测性能的显著提高,与基线模型相比,非理想条件的检测性能提高了3.8-8.1%,而在理想条件下,标准Faster RCNN算法的mAP@[0.5:0.95]的相对增强只有1.3%。
https://arxiv.org/abs/2404.11214
Invasive ductal carcinoma (IDC) is the most prevalent form of breast cancer. Breast tissue histopathological examination is critical in diagnosing and classifying breast cancer. Although existing methods have shown promising results, there is still room for improvement in the classification accuracy and generalization of IDC using histopathology images. We present a novel approach, Supervised Contrastive Vision Transformer (SupCon-ViT), for improving the classification of invasive ductal carcinoma in terms of accuracy and generalization by leveraging the inherent strengths and advantages of both transfer learning, i.e., pre-trained vision transformer, and supervised contrastive learning. Our results on a benchmark breast cancer dataset demonstrate that SupCon-Vit achieves state-of-the-art performance in IDC classification, with an F1-score of 0.8188, precision of 0.7692, and specificity of 0.8971, outperforming existing methods. In addition, the proposed model demonstrates resilience in scenarios with minimal labeled data, making it highly efficient in real-world clinical settings where labelled data is limited. Our findings suggest that supervised contrastive learning in conjunction with pre-trained vision transformers appears to be a viable strategy for an accurate classification of IDC, thus paving the way for a more efficient and reliable diagnosis of breast cancer through histopathological image analysis.
侵袭性导管癌(IDC)是乳腺癌中最常见的类型。对乳房组织进行病理学检查对于诊断和分类乳腺癌至关重要。尽管现有的方法已经显示出良好的效果,但在使用组织学图像进行IDC分类的准确性和泛化方面仍有改进的空间。我们提出了一个新方法,监督对比视觉Transformer(SupCon-ViT),通过利用迁移学习(即预训练视觉Transformer)的固有优势和监督对比学习(Supervised contrastive learning)的优势,提高IDC分类的准确性和泛化。我们在基准乳腺癌数据集上的结果表明,SupCon-ViT在IDC分类上实现了最先进的性能,其F1分数为0.8188,精确度为0.7692,特异性为0.8971,优于现有方法。此外,与标记数据较少的情景相对,所提出的模型表现出强大的鲁棒性,因此在临床实践中,具有有限标记数据的情况下,该模型具有很高的效率。我们的研究结果表明,在监督对比学习与预训练视觉Transformer相结合的情况下,准确地分类IDC可能是可行的策略,为通过组织学图像分析更准确和可靠的乳腺癌诊断铺平道路。
https://arxiv.org/abs/2404.11052
Earth structural heterogeneities have a remarkable role in the petroleum economy for both exploration and production projects. Automatic detection of detailed structural heterogeneities is challenging when considering modern machine learning techniques like deep neural networks. Typically, these techniques can be an excellent tool for assisted interpretation of such heterogeneities, but it heavily depends on the amount of data to be trained. We propose an efficient and cost-effective architecture for detecting seismic structural heterogeneities using Convolutional Neural Networks (CNNs) combined with Attention layers. The attention mechanism reduces costs and enhances accuracy, even in cases with relatively noisy data. Our model has half the parameters compared to the state-of-the-art, and it outperforms previous methods in terms of Intersection over Union (IoU) by 0.6% and precision by 0.4%. By leveraging synthetic data, we apply transfer learning to train and fine-tune the model, addressing the challenge of limited annotated data availability.
地球结构异质性在石油经济中具有显著的作用,无论是勘探还是生产项目。当考虑现代机器学习技术如深度神经网络时,自动检测详细结构异质性是非常具有挑战性的。通常,这些技术可以作为辅助解释这种异质性的工具,但它们对训练数据的数量非常依赖。我们提出了使用卷积神经网络(CNN)与注意力层相结合来检测地震结构异质性的高效且成本效益型架构。注意力机制可以降低成本并提高准确度,即使在相对嘈杂的数据中也是如此。与最先进的模型相比,我们的模型具有半数量的参数,并且在交叉 over Union(IoU)方面比前人方法提高了0.6%的准确度,而在精度方面提高了0.4%。通过利用合成数据,我们应用迁移学习来训练和微调模型,解决了缺乏充分注释数据可用性的挑战。
https://arxiv.org/abs/2404.10170
Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models and two TL models in treatable retinal diseases classification using small-scale Optical Coherence Tomography (OCT) images ranging from 125 to 4000 with balanced or imbalanced distribution for training. The proposed SSL model achieves the state-of-art accuracy of 98.84% using only 4,000 training images. Our results suggest the SSL models provide superior performance under both the balanced and imbalanced training scenarios. The SSL model with MoCo-v2 scheme has consistent good performance under the imbalanced scenario and, especially, surpasses the other models when the training set is less than 500 images.
通过基于图像的神经网络进行自动医疗诊断已经在近几年广受欢迎并得到成熟。然而,由于医疗图像的稀缺性和昂贵的劳动标注成本,它仍然受到限制。自监督学习(SSL)是一种好的替代Transfer Learning(TL)的方法,适用于不平衡图像数据集。在本文中,我们评估了四种预训练的SSL模型和两种TL模型在可治疗性眼病分类中的表现,使用大小从125到4000的平衡或不平衡分布的奥普光干涉图(OCT)图像进行训练。与仅使用4,000个训练图像实现98.84%的尖端准确度相比,所提出的SSL模型具有出色的性能。我们的结果表明,在平衡和不平衡训练场景下,SSL模型都具有卓越的性能。使用MoCo-v2方案的SSL模型在不平衡场景下的表现始终良好,尤其是当训练集小于500张图片时,更优于其他模型。
https://arxiv.org/abs/2404.10166
Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of limited data, introducing only a few parameters to harness task-specific insights from VLMs. Despite significant progress, current leading ETL methods tend to overfit the narrow distributions of base classes seen during training and encounter two primary challenges: (i) only utilizing uni-modal information to modeling task-specific knowledge; and (ii) using costly and time-consuming methods to supplement knowledge. To address these issues, we propose a Conditional Prototype Rectification Prompt Learning (CPR) method to correct the bias of base examples and augment limited data in an effective way. Specifically, we alleviate overfitting on base classes from two aspects. First, each input image acquires knowledge from both textual and visual prototypes, and then generates sample-conditional text tokens. Second, we extract utilizable knowledge from unlabeled data to further refine the prototypes. These two strategies mitigate biases stemming from base classes, yielding a more effective classifier. Extensive experiments on 11 benchmark datasets show that our CPR achieves state-of-the-art performance on both few-shot classification and base-to-new generalization tasks. Our code is avaliable at \url{this https URL}.
预训练的大型视觉语言模型(VLMs)已经对一般视觉概念取得了深刻的理解。最近在高效迁移学习(ETL)方面的进步表明,在有限数据的情况下对VLMs进行微调取得了显著的成功,只引入了几个参数就有效地利用了VLMs的任务特定知识。尽管取得了显著的进展,但目前的领导ETL方法往往过于关注训练期间的狭窄基类分布,并遇到了两个主要挑战:(i)仅利用单模态信息来建模任务特定知识;(ii)使用昂贵且耗时费力的方法来补充知识。为了应对这些问题,我们提出了一个有条件的原型矩形化提示学习(CPR)方法来纠正基类的偏差并有效地增加有限数据。具体来说,我们通过两个方面减轻了基类的过拟合:首先,每个输入图像从文本和视觉原型中获取知识,然后生成条件文本标记;其次,我们从未标记的数据中提取有用的知识,进一步优化原型。这两种策略减轻了基类的偏差,产生了更有效的分类器。在11个基准数据集上的大量实验证明,我们的CPR在几 shot分类和基点对新通用分类任务上取得了最先进的性能。我们的代码可在此处访问:https://this URL。
https://arxiv.org/abs/2404.09872
Low-resource named entity recognition is still an open problem in NLP. Most state-of-the-art systems require tens of thousands of annotated sentences in order to obtain high performance. However, for most of the world's languages, it is unfeasible to obtain such annotation. In this paper, we present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly. Learning character representations for multiple related languages allows transfer among the languages, improving F1 by up to 9.8 points over a loglinear CRF baseline.
低资源命名实体识别仍然是自然语言处理(NLP)中的一个开放问题。大多数最先进的系统需要数万条带有注释的句子才能获得高性能。然而,对于大多数世界语言,获得这样的注释是不现实的。在本文中,我们提出了一个迁移学习方案,我们通过训练带有上下文关系的字符级神经递归模型(CRF)来共同预测高资源语言和低资源语言中的命名实体。为多个相关语言学习字符表示允许在语言之间进行转移,从而提高F1值,与线性CRF基线相比,F1值提高了9.8个单位。
https://arxiv.org/abs/2404.09383
To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0\% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks.
为解决有限样本量、耗时特征设计和乳腺癌图像检测和分类准确度低的问题,我们提出了一个结合深度学习和迁移学习思想的乳腺癌图像分类模型算法。该算法基于深度神经网络的DenseNet结构,通过引入注意机制构建了一个网络模型,并使用多级迁移学习对增强数据集进行训练。实验结果表明,该算法在测试集上的效率超过84.0%,与之前模型的分类准确度相比有显著提高,使得该算法适用于医疗乳腺癌检测任务。
https://arxiv.org/abs/2404.09226
We present an innovative of artificial intelligence with column chromatography, aiming to resolve inefficiencies and standardize data collection in chemical separation and purification domain. By developing an automated platform for precise data acquisition and employing advanced machine learning algorithms, we constructed predictive models to forecast key separation parameters, thereby enhancing the efficiency and quality of chromatographic processes. The application of transfer learning allows the model to adapt across various column specifications, broadening its utility. A novel metric, separation probability ($S_p$), quantifies the likelihood of effective compound separation, validated through experimental verification. This study signifies a significant step forward int the application of AI in chemical research, offering a scalable solution to traditional chromatography challenges and providing a foundation for future technological advancements in chemical analysis and purification.
我们报道了一种人工智能与柱色谱相结合的创新方法,旨在解决化学分离和纯化领域中的低效性和标准化数据收集问题。通过开发精确数据采集的自動化平台和应用先进的机器学习算法,我们构建了预测模型来预测关键分离参数,从而提高了色谱过程的效率和质量。传递学习的使用使模型能够适应各种柱形规格,从而扩大了其应用范围。一种新的指标,分离概率($S_p$),通过实验验证了有效化合物分离的可能性。这项研究在应用人工智能技术进行化学研究方面取得了显著的进展,为解决传统色谱挑战提供了可扩展的解决方案,并为未来化学分析和纯化技术的进步奠定了基础。
https://arxiv.org/abs/2404.09114
Prior computer vision research extensively explores adapting pre-trained vision transformers (ViT) to downstream tasks. However, the substantial number of parameters requiring adaptation has led to a focus on Parameter Efficient Transfer Learning (PETL) as an approach to efficiently adapt large pre-trained models by training only a subset of parameters, achieving both parameter and storage efficiency. Although the significantly reduced parameters have shown promising performance under transfer learning scenarios, the structural redundancy inherent in the model still leaves room for improvement, which warrants further investigation. In this paper, we propose Head-level Efficient Adaptation with Taylor-expansion importance score (HEAT): a simple method that efficiently fine-tuning ViTs at head levels. In particular, the first-order Taylor expansion is employed to calculate each head's importance score, termed Taylor-expansion Importance Score (TIS), indicating its contribution to specific tasks. Additionally, three strategies for calculating TIS have been employed to maximize the effectiveness of TIS. These strategies calculate TIS from different perspectives, reflecting varying contributions of parameters. Besides ViT, HEAT has also been applied to hierarchical transformers such as Swin Transformer, demonstrating its versatility across different transformer architectures. Through extensive experiments, HEAT has demonstrated superior performance over state-of-the-art PETL methods on the VTAB-1K benchmark.
之前的计算机视觉研究广泛探讨将预训练的视觉 transformer(ViT)适应下游任务。然而,需要适应的参数数量巨大导致了对参数高效的迁移学习(PETL)方法的关注,该方法通过仅训练模型的部分参数,以高效地适应大预训练模型。尽管在迁移学习场景下,显著减少的参数已经表现出良好的性能,但模型的结构冗余仍然需要改进,这需要进一步的研究。在本文中,我们提出了HEAT:一种简单的方法,在头部级别上高效地微调ViT。 具体来说,第一阶泰勒展开被用于计算每个头的的重要性分数,称之为泰勒展开重要性分数(TIS),表示其对特定任务的贡献。此外,为了最大化TIS的有效性,我们采用了三种计算TIS的方法。这些方法从不同的角度计算TIS,反映了参数对特定任务的不同贡献。除了ViT之外,HEAT还应用于如Swin Transformer这样的分层 transformer,证明了其在不同变换器架构上的 versatility。 通过大量实验,HEAT 在VTAB-1K基准上已经证明了与最先进的PETL方法相比的优越性能。
https://arxiv.org/abs/2404.08894
As generative AI progresses rapidly, new synthetic image generators continue to emerge at a swift pace. Traditional detection methods face two main challenges in adapting to these generators: the forensic traces of synthetic images from new techniques can vastly differ from those learned during training, and access to data for these new generators is often limited. To address these issues, we introduce the Ensemble of Expert Embedders (E3), a novel continual learning framework for updating synthetic image detectors. E3 enables the accurate detection of images from newly emerged generators using minimal training data. Our approach does this by first employing transfer learning to develop a suite of expert embedders, each specializing in the forensic traces of a specific generator. Then, all embeddings are jointly analyzed by an Expert Knowledge Fusion Network to produce accurate and reliable detection decisions. Our experiments demonstrate that E3 outperforms existing continual learning methods, including those developed specifically for synthetic image detection.
随着生成型AI快速发展,新的图像生成器也在以快速的速度涌现。传统检测方法在适应这些生成器时面临着两个主要挑战:来自新技术的合成图像的刑事痕迹与训练期间学习到的有所不同,而这些新生成器的数据获取通常受限。为解决这些问题,我们引入了专家嵌入器(E3)这一新颖的持续学习框架,用于更新合成图像检测器。E3使可以使用极少的训练数据准确地检测出新生成的图像。我们的方法通过首先使用迁移学习开发了一系列专家嵌入器,每个专家专门研究特定生成器的刑事痕迹,然后由专家知识融合网络对所有嵌入进行共同分析,产生准确可靠的检测决策。我们的实验证明,E3超越了专门为合成图像检测而设计的现有持续学习方法。
https://arxiv.org/abs/2404.08814
In recent years, we have seen many advancements in wood species identification. Methods like DNA analysis, Near Infrared (NIR) spectroscopy, and Direct Analysis in Real Time (DART) mass spectrometry complement the long-established wood anatomical assessment of cell and tissue morphology. However, most of these methods have some limitations such as high costs, the need for skilled experts for data interpretation, and the lack of good datasets for professional reference. Therefore, most of these methods, and certainly the wood anatomical assessment, may benefit from tools based on Artificial Intelligence. In this paper, we apply two transfer learning techniques with Convolutional Neural Networks (CNNs) to a multi-view Congolese wood species dataset including sections from different orientations and viewed at different microscopic magnifications. We explore two feature extraction methods in detail, namely Global Average Pooling (GAP) and Random Encoding of Aggregated Deep Activation Maps (RADAM), for efficient and accurate wood species identification. Our results indicate superior accuracy on diverse datasets and anatomical sections, surpassing the results of other methods. Our proposal represents a significant advancement in wood species identification, offering a robust tool to support the conservation of forest ecosystems and promote sustainable forestry practices.
近年来,我们在木种鉴定方面看到了许多进展。像DNA分析、近红外(NIR)光谱和实时直接分析(DART)质谱法等方法补充了长期确立的木解剖学评估。然而,大多数这些方法都有一些局限性,比如高成本,数据解释需要熟练专家,以及缺乏专业参考数据的缺乏。因此,大多数这些方法和木解剖学评估可能会从基于人工智能的工具中受益。在本文中,我们将两个迁移学习技术——卷积神经网络(CNNs)应用于包括不同方向和以不同显微镜放大倍数的摩氏 Congolese 木种数据集中的截面。我们详细探讨了两种特征提取方法,即全局平均池化(GAP)和聚集深度激活映射(RADAM)的随机编码。我们的结果表明,在多样数据集和木解剖学截面中具有卓越的准确性和精度,超过了其他方法的成果。我们的建议在木种鉴定方面是一个显著的进步,为支持森林生态系统的保护和促进可持续森林经营提供了一个有力的工具。
https://arxiv.org/abs/2404.08585