The proliferation of fake news has emerged as a significant threat to the integrity of information dissemination, particularly on social media platforms. Misinformation can spread quickly due to the ease of creating and disseminating content, affecting public opinion and sociopolitical events. Identifying false information is therefore essential to reducing its negative consequences and maintaining the reliability of online news sources. Traditional approaches to fake news detection often rely solely on content-based features, overlooking the crucial role of social context in shaping the perception and propagation of news articles. In this paper, we propose a comprehensive approach that integrates social context-based features with news content features to enhance the accuracy of fake news detection in under-resourced languages. We perform several experiments utilizing a variety of methodologies, including traditional machine learning, neural networks, ensemble learning, and transfer learning. Assessment of the outcomes of the experiments shows that the ensemble learning approach has the highest accuracy, achieving a 0.99 F1 score. Additionally, when compared with monolingual models, the fine-tuned model with the target language outperformed others, achieving a 0.94 F1 score. We analyze the functioning of the models, considering the important features that contribute to model performance, using explainable AI techniques.
虚假信息的泛滥成为影响信息传播 integrity的一个显著威胁,特别是在社交媒体平台上。虚假信息可以快速传播,因为创建和传播内容变得容易,会影响公众意见和社会政治事件。因此,识别虚假信息是减少其负面后果并维护在线新闻来源可靠性的关键。传统方法对待检测虚假信息通常仅基于内容特征,而忽略了社会背景在塑造新闻文章感知和传播中的关键作用。在本文中,我们提出了一个全面的方法,将基于社交背景的特征与新闻内容特征相结合,以提高资源有限语言中虚假信息检测的准确性。我们使用多种方法进行实验,包括传统机器学习、神经网络、集成学习和迁移学习。实验评估结果表明,集成学习方法具有最高的准确率,达到0.99 F1 score。此外,与单语模型相比,带有目标语言的微调模型表现优异,达到0.94 F1 score。我们分析了模型的运作,考虑了有助于模型性能的重要特征,并使用可解释AI技术进行了分析。
https://arxiv.org/abs/2410.02609
Recent advances in AI have led to significant results in robotic learning, but skills like grasping remain partially solved. Many recent works exploit synthetic grasping datasets to learn to grasp unknown objects. However, those datasets were generated using simple grasp sampling methods using priors. Recently, Quality-Diversity (QD) algorithms have been proven to make grasp sampling significantly more efficient. In this work, we extend QDG-6DoF, a QD framework for generating object-centric grasps, to scale up the production of synthetic grasping datasets. We propose a data augmentation method that combines the transformation of object meshes with transfer learning from previous grasping repertoires. The conducted experiments show that this approach reduces the number of required evaluations per discovered robust grasp by up to 20%. We used this approach to generate QDGset, a dataset of 6DoF grasp poses that contains about 3.5 and 4.5 times more grasps and objects, respectively, than the previous state-of-the-art. Our method allows anyone to easily generate data, eventually contributing to a large-scale collaborative dataset of synthetic grasps.
近年来在人工智能方面的进步在机器人学习方面取得了显著的成果,但抓取技能仍然是部分解决。许多最近的工作利用合成抓取数据集来学习抓取未知的物体。然而,这些数据集是使用简单的抓取抽样方法生成的,利用先验知识。最近,Quality-Diversity(QD)算法已经被证明使得抓取抽样显著更有效。在这项工作中,我们将QDG-6DoF扩展到用于生成物体中心抓取的合成抓取数据集的生产。我们提出了一种数据增强方法,将物体拓扑结构的变换与从以前抓取序列中进行迁移学习相结合。实验结果表明,这种方法将所需评估的数量降低至发现稳健抓取的数量的20%左右。我们使用这种方法生成了QDGset,一个包含比以前最先进的约3.5和4.5倍更多抓取和对象的6DoF抓取数据集。我们的方法允许任何人轻松生成数据,最终为大型协同抓取数据集做出贡献。
https://arxiv.org/abs/2410.02319
Facial recognition using deep learning has been widely used in social life for applications such as authentication, smart door locks, and photo grouping, etc. More and more networks have been developed to facilitate computer vision tasks, such as ResNet, DenseNet, EfficientNet, ConvNeXt, and Siamese networks. However, few studies have systematically compared the advantages and disadvantages of such neural networks in identifying individuals from images, especially for pet animals like cats. In the present study, by systematically comparing the efficacy of different neural networks in cat recognition, we found traditional CNNs trained with transfer learning have better performance than models trained with the fine-tuning method or Siamese networks in individual cat recognition. In addition, ConvNeXt and DenseNet yield significant results which could be further optimized for individual cat recognition in pet stores and in the wild. These results provide a method to improve cat management in pet stores and monitoring of cats in the wild.
使用深度学习进行面部识别已经在社交生活中得到了广泛应用,例如身份验证、智能门锁和照片群组等。为了促进计算机视觉任务,已经开发了许多网络,如ResNet、DenseNet、EfficientNet、ConvNeXt和Siamese网络。然而,很少有研究系统地比较了这些神经网络在从图像中识别个人时的优缺点,特别是对于宠物动物(如猫)。在当前的研究中,通过系统地比较不同神经网络在猫识别方面的效果,我们发现使用迁移学习训练的传统CNN具有更好的性能,而使用微调方法或Siamese网络训练的模型在单个猫识别方面的表现较差。此外,ConvNeXt和DenseNet取得了显著的成果,这些成果可以在宠物商店和野外进一步优化用于单个猫识别。这些结果为改进宠物商店中猫的管理和野外猫的监测提供了一种方法。
https://arxiv.org/abs/2410.02305
Integrating artificial intelligence into modern society is profoundly transformative, significantly enhancing productivity by streamlining various daily tasks. AI-driven recognition systems provide notable advantages in the food sector, including improved nutrient tracking, tackling food waste, and boosting food production and consumption efficiency. Accurate food classification is a crucial initial step in utilizing advanced AI models, as the effectiveness of this process directly influences the success of subsequent operations; therefore, achieving high accuracy at a reasonable speed is essential. Despite existing research efforts, a gap persists in improving performance while ensuring rapid processing times, prompting researchers to pursue cost-effective and precise models. This study addresses this gap by employing the state-of-the-art EfficientNetB7 architecture, enhanced through transfer learning, data augmentation, and the CBAM attention module. This methodology results in a robust model that surpasses previous studies in accuracy while maintaining rapid processing suitable for real-world applications. The Food11 dataset from Kaggle was utilized, comprising 16643 imbalanced images across 11 diverse classes with significant intra-category diversities and inter-category similarities. Furthermore, the proposed methodology, bolstered by various deep learning techniques, consistently achieves an impressive average accuracy of 96.40%. Notably, it can classify over 60 images within one second during inference on unseen data, demonstrating its ability to deliver high accuracy promptly. This underscores its potential for practical applications in accurate food classification and enhancing efficiency in subsequent processes.
将人工智能融入现代社会是彻底颠覆性的,通过简化各种日常任务显著提高生产力。 AI 驱动的识别系统在食品领域具有显著优势,包括改善营养追踪、解决食品浪费和提高食品生产和消费效率。准确的食品分类是利用高级 AI 模型的关键初始步骤,因为这一过程的有效性直接影响后续操作的成功;因此,在合理的时间内实现高准确度是至关重要的。尽管现有研究已经取得了很大进展,但在保证快速处理时间的同时提高性能方面仍然存在差距,促使研究人员追求成本效益和精确的模型。本研究通过采用最先进的 EfficientNetB7 架构、通过迁移学习、数据增强和 CBAM 注意模块进行优化,来解决这一差距。这一方法产生了一个稳健的模型,在保持对真实应用场景的高准确度的同时,实现了惊人的平均准确度为 96.40%。值得注意的是,在推理时它可以将超过 60 张图像分类,证明其迅速提供高准确度的能力。这表明其在准确食品分类和提高后续过程效率的实用潜力。
https://arxiv.org/abs/2410.02304
In-context learning (ICL) is an effective approach to help large language models (LLMs) adapt to various tasks by providing demonstrations of the target task. Considering the high cost of labeling demonstrations, many methods propose synthesizing demonstrations from scratch using LLMs. However, the quality of the demonstrations synthesized from scratch is limited by the capabilities and knowledge of LLMs. To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks. ICTL consists of two steps: source sampling and target transfer. First, we define an optimization objective, which minimizes transfer error to sample source demonstrations similar to the target task. Then, we employ LLMs to transfer the sampled source demonstrations to the target task, matching the definition and format of the target task. Experiments on Super-NI show that ICTL outperforms synthesis from scratch by 2.0% on average, demonstrating the effectiveness of our method.
上下文学习(ICL)是一种有效的方法,通过提供目标任务的演示来帮助大型语言模型(LLMs)适应各种任务。考虑到用标签演示的高成本,许多方法提议使用LLMs从头合成演示。然而,从头合成的演示的质量受到LLMs的能力和知识限制。为了解决这个问题,我们受到迁移学习的启发,提出了In-Context Transfer Learning(ICTL)。ICTL通过从类似源任务中转移带有标签的演示来合成目标任务的演示。ICTL包括两个步骤:源抽样和目标传递。首先,我们定义一个优化目标,即从目标任务中最小化传输误差,以抽样与目标任务相似的源演示。然后,我们使用LLMs将抽样的源演示传递到目标任务,并匹配目标任务的定义和格式。在Super-NI上的实验表明,ICTL平均比从头合成方式优秀2.0%,证明了我们的方法的有效性。
https://arxiv.org/abs/2410.01548
Monkeypox (MPox) has emerged as a significant global concern, with cases steadily increasing daily. Conventional detection methods, including polymerase chain reaction (PCR) and manual examination, exhibit challenges of low sensitivity, high cost, and substantial workload. Therefore, deep learning offers an automated solution; however, the datasets include data scarcity, texture, contrast, inter-intra class variability, and similarities with other skin infectious diseases. In this regard, a novel hybrid approach is proposed that integrates the learning capacity of Residual Learning and Spatial Exploitation Convolutional Neural Network (CNN) with a customized Swin Transformer (RS-FME-SwinT) to capture multi-scale global and local correlated features for MPox diagnosis. The proposed RS-FME-SwinT technique employs a transfer learning-based feature map enhancement (FME) technique, integrating the customized SwinT for global information capture, residual blocks for texture extraction, and spatial blocks for local contrast variations. Moreover, incorporating new inverse residual blocks within the proposed SwinT effectively captures local patterns and mitigates vanishing gradients. The proposed RS-FME-SwinT has strong learning potential of diverse features that systematically reduce intra-class MPox variation and enable precise discrimination from other skin diseases. Finally, the proposed RS-FME-SwinT is a holdout cross-validated on a diverse MPox dataset and achieved outperformance on state-of-the-art CNNs and ViTs. The proposed RS-FME-SwinT demonstrates commendable results of an accuracy of 97.80%, sensitivity of 96.82%, precision of 98.06%, and an F-score of 97.44% in MPox detection. The RS-FME-SwinT could be a valuable tool for healthcare practitioners, enabling prompt and accurate MPox diagnosis and contributing significantly to mitigation efforts.
猴痘(MPox)已成为一个全球关注的问题,每天病例不断增加。传统的检测方法包括PCR和手动检查,存在灵敏度低、成本高和大量工作量的挑战。因此,深度学习提供了自动化的解决方案;然而,数据集中存在数据稀疏性、纹理、对比度和与其他皮肤病共性的相似性。在这方面,提出了一种新颖的混合方法,将残差学习和空间利用卷积神经网络(CNN)的学习能力与自定义Swin Transformer(RS-FME-SwinT)相结合,以捕捉MPox诊断的多尺度全局和局部相关特征。所提出的RS-FME-SwinT技术采用了一种基于转移学习的特征图增强(FME)技术,将自定义SwinT用于全局信息捕获,残差块用于纹理提取,空间块用于局部对比变化。此外,在RS-FME-SwinT中,将新型的反向残差块融入其中,有效捕捉局部模式并减轻消失梯度。所提出的RS-FME-SwinT在MPox检测中具有强大的学习潜力,系统性地降低类内MPox的方差,并能够精确地从其他皮肤病中进行区分。最后,在多样性的MPox数据集上进行交叉验证,RS-FME-SwinT在MPox检测方面的表现优于最先进的CNN和ViT。RS-FME-SwinT在MPox检测方面取得了令人印象深刻的准确率为97.80%,灵敏度为96.82%,精度为98.06%,F分数为97.44%。RS-FME-SwinT可以为医疗保健专业人员提供有价值的工具,实现迅速和准确的MPox诊断,为减轻努力做出重要贡献。
https://arxiv.org/abs/2410.01216
This paper presents an Arabic Alphabet Sign Language recognition approach, using deep learning methods in conjunction with transfer learning and transformer-based models. We study the performance of the different variants on two publicly available datasets, namely ArSL2018 and AASL. This task will make full use of state-of-the-art CNN architectures like ResNet50, MobileNetV2, and EfficientNetB7, and the latest transformer models such as Google ViT and Microsoft Swin Transformer. These pre-trained models have been fine-tuned on the above datasets in an attempt to capture some unique features of Arabic sign language motions. Experimental results present evidence that the suggested methodology can receive a high recognition accuracy, by up to 99.6\% and 99.43\% on ArSL2018 and AASL, respectively. That is far beyond the previously reported state-of-the-art approaches. This performance opens up even more avenues for communication that may be more accessible to Arabic-speaking deaf and hard-of-hearing, and thus encourages an inclusive society.
本文提出了一种阿拉伯文字符手语识别方法,结合深度学习方法、迁移学习和Transformer基模型。我们在两个公开可用的数据集上研究了不同变体的性能,即ARSL2018和AASL。这项任务将充分利用最先进的CNN架构(如ResNet50、MobileNetV2和EfficientNetB7)以及最新的Transformer模型(如Google ViT和Microsoft Swin Transformer)。这些预训练模型在上述数据集上进行了微调,以捕捉阿拉伯手语动作的独特特征。实验结果表明,与之前报道的最好的方法相比,所提出的方法具有非常高的识别准确率,分别为99.6%和99.43%。这远远超出了以前报道的水平。这种性能为更多的沟通渠道打开了大门,可能更易于对阿拉伯语盲聋人士使用,从而鼓励包容性社会。
https://arxiv.org/abs/2410.00681
Molecules have a number of distinct properties whose importance and application vary. Often, in reality, labels for some properties are hard to achieve despite their practical importance. A common solution to such data scarcity is to use models of good generalization with transfer learning. This involves domain experts for designing source and target tasks whose features are shared. However, this approach has limitations: i). Difficulty in accurate design of source-target task pairs due to the large number of tasks, and ii). corresponding computational burden verifying many trials and errors of transfer learning design, thereby iii). constraining the potential of foundation modeling of multi-task molecular property prediction. We address the limitations of the manual design of transfer learning via data-driven bi-level optimization. The proposed method enables scalable multi-task transfer learning for molecular property prediction by automatically obtaining the optimal transfer ratios. Empirically, the proposed method improved the prediction performance of 40 molecular properties and accelerated training convergence.
分子具有许多不同的性质,其重要性和应用价值因情况而异。通常,在现实生活中,尽管某些性质的标签很难实现,但由于其实际重要性,实现这些标签仍然具有很高的价值。解决这种数据稀缺的常见方法是使用具有迁移学习能力的良好泛化模型。这包括领域专家设计源和目标任务,这些任务的特征是共享的。然而,这种方法也存在局限性:i)由于任务数量众多,难以准确设计源-目标任务对;ii)通过转移学习验证许多尝试和错误的设计计算负担,从而限制了基础建模多任务分子性质预测的潜力。我们通过数据驱动的双层优化解决了手动设计转移学习的局限性。所提出的方法通过自动获得最优转移比例,实现了可扩展的多任务分子性质预测。实验证明,与传统方法相比,所提出的方法提高了40个分子性质的预测性能,并加速了训练收敛。
https://arxiv.org/abs/2410.00432
Fire hazards are extremely dangerous, particularly in sectors such as the transportation industry, where political unrest increases the likelihood of their occurrence. By employing IP cameras to facilitate the setup of fire detection systems on transport vehicles, losses from fire events may be prevented proactively. However, the development of lightweight fire detection models is required due to the computational constraints of the embedded systems within these cameras. We introduce FireLite, a low-parameter convolutional neural network (CNN) designed for quick fire detection in contexts with limited resources, in response to this difficulty. With an accuracy of 98.77\%, our model -- which has just 34,978 trainable parameters achieves remarkable performance numbers. It also shows a validation loss of 8.74 and peaks at 98.77 for precision, recall, and F1-score measures. Because of its precision and efficiency, FireLite is a promising solution for fire detection in resource-constrained environments.
火灾危险极其危险,尤其是在运输行业等政治动荡增加火灾发生可能性的领域。通过在运输车辆上安装IP摄像头以促进火灾检测系统的设置,可以主动预防火灾事件造成的损失。然而,由于这些相机内置系统的计算限制,需要开发轻量级的火灾检测模型。为了应对这一困难,我们引入了FireLite,一种为有限资源背景下快速火灾检测设计的低参数卷积神经网络(CNN)。我们的模型的准确率为98.77,仅拥有34,978个可训练参数,却取得了令人瞩目的性能。它还显示了验证损失为8.74,精度、召回率和F1分数达到最高峰为98.77。由于其精确度和效率,FireLite为资源受限环境下的火灾检测提供了一个有前景的解决方案。
https://arxiv.org/abs/2409.20384
Existing unified methods typically treat multi-degradation image restoration as a multi-task learning problem. Despite performing effectively compared to single degradation restoration methods, they overlook the utilization of commonalities and specificities within multi-task restoration, thereby impeding the model's performance. Inspired by the success of deep generative models and fine-tuning techniques, we proposed a universal image restoration framework based on multiple low-rank adapters (LoRA) from multi-domain transfer learning. Our framework leverages the pre-trained generative model as the shared component for multi-degradation restoration and transfers it to specific degradation image restoration tasks using low-rank adaptation. Additionally, we introduce a LoRA composing strategy based on the degradation similarity, which adaptively combines trained LoRAs and enables our model to be applicable for mixed degradation restoration. Extensive experiments on multiple and mixed degradations demonstrate that the proposed universal image restoration method not only achieves higher fidelity and perceptual image quality but also has better generalization ability than other unified image restoration models. Our code is available at this https URL.
现有的统一方法通常将多降解图像修复视为多任务学习问题。尽管与单降解修复方法相比表现出色,但它们忽略了多任务修复中的共性和特定性,从而阻碍了模型的性能。受到深度生成模型和微调技术的成功启发,我们提出了一个基于多域迁移学习的多低秩适配器(LoRA)的全局图像修复框架。我们的框架利用预训练生成模型作为多降解修复的共同组件,并通过低秩适应将其转移到特定降解图像修复任务。此外,我们还引入了一种基于降解相似性的LoRA组合策略,将训练好的LoRAs动态地组合起来,使我们的模型能够应用于混合降解修复。在多个和混合降解实验中,我们证明了与其它统一图像修复模型相比,所提出的全局图像修复方法不仅实现了更高的保真度和感知图像质量,而且具有更好的泛化能力。我们的代码可在此处访问:https://url.cn/
https://arxiv.org/abs/2409.20197
Capitalizing on image-level pre-trained models for various downstream tasks has recently emerged with promising performance. However, the paradigm of "image pre-training followed by video fine-tuning" for high-dimensional video data inevitably poses significant performance bottlenecks. Furthermore, in the medical domain, many surgical video tasks encounter additional challenges posed by the limited availability of video data and the necessity for comprehensive spatial-temporal modeling. Recently, Parameter-Efficient Image-to-Video Transfer Learning has emerged as an efficient and effective paradigm for video action recognition tasks, which employs image-level pre-trained models with promising feature transferability and involves cross-modality temporal modeling with minimal fine-tuning. Nevertheless, the effectiveness and generalizability of this paradigm within intricate surgical domain remain unexplored. In this paper, we delve into a novel problem of efficiently adapting image-level pre-trained models to specialize in fine-grained surgical phase recognition, termed as Parameter-Efficient Image-to-Surgical-Video Transfer Learning. Firstly, we develop a parameter-efficient transfer learning benchmark SurgPETL for surgical phase recognition, and conduct extensive experiments with three advanced methods based on ViTs of two distinct scales pre-trained on five large-scale natural and medical datasets. Then, we introduce the Spatial-Temporal Adaptation module, integrating a standard spatial adapter with a novel temporal adapter to capture detailed spatial features and establish connections across temporal sequences for robust spatial-temporal modeling. Extensive experiments on three challenging datasets spanning various surgical procedures demonstrate the effectiveness of SurgPETL with STA.
利用图像级预训练模型对于各种下游任务近年来呈现出良好的性能。然而,对于高维视频数据, "先图像预训练,后视频微调"的范式无疑提出了显著的性能瓶颈。此外,在医学领域,许多手术视频任务面临来自视频数据有限性和需要全面空间时间建模的额外挑战。最近,参数高效的图像到图像转移学习(Parameter-Efficient Image-to-Video Transfer Learning)作为一种有效且有效的范式出现了,它采用具有良好特征传递能力的图像级预训练模型,并涉及最小微调的跨模态时间建模。然而,这种范式在复杂手术领域内的效果和普适性尚未被探索。在本文中,我们深入研究了将图像级预训练模型适应性地应用于精细手术阶段识别的新问题,称之为参数高效的图像到手术视频转移学习(Parameter-Efficient Image-to-Surgical-Video Transfer Learning)。首先,我们开发了一个参数高效的转移学习基准SurgPETL,基于两个不同规模的预训练ViTs,在五个大型自然和医疗数据集上进行了广泛的实验。然后,我们引入了空间-时间适应模块,将标准空间适配器与新颖的时间适配器集成,以捕捉详细的空间特征并建立时间序列之间的连接,实现稳健的空间-时间建模。对于三个具有各种手术过程的挑战性数据集的实验证明,SurgPETL与STA具有相同的效果。
https://arxiv.org/abs/2409.20083
Transfer learning is a common practice that alleviates the need for extensive data to train neural networks. It is performed by pre-training a model using a source dataset and fine-tuning it for a target task. However, not every source dataset is appropriate for each target dataset, especially for time series. In this paper, we propose a novel method of selecting and using multiple datasets for transfer learning for time series classification. Specifically, our method combines multiple datasets as one source dataset for pre-training neural networks. Furthermore, for selecting multiple sources, our method measures the transferability of datasets based on shapelet discovery for effective source selection. While traditional transferability measures require considerable time for pre-training all the possible sources for source selection of each possible architecture, our method can be repeatedly used for every possible architecture with a single simple computation. Using the proposed method, we demonstrate that it is possible to increase the performance of temporal convolutional neural networks (CNN) on time series datasets.
迁移学习是一种常见的方法,可以减轻训练神经网络所需的大量数据。它通过使用源数据集预训练模型,然后对目标任务进行微调来实现。然而,并不是每个源数据集都适用于每个目标数据集,尤其是对于时间序列数据。在本文中,我们提出了一种新的方法,用于为时间序列分类选择和利用多个数据集进行迁移学习。具体来说,我们的方法将多个数据集作为一个源数据集进行预训练。此外,为了选择多个来源,我们的方法基于形状根发现来衡量数据集的迁移性。虽然传统迁移学习方法需要相当长的时间来预训练每个可能架构的每个可能来源,但我们的方法可以进行简单的计算,即可用于所有可能的架构。通过使用所提出的方法,我们证明了在时间序列数据集上,可以提高时间卷积神经网络(CNN)的性能。
https://arxiv.org/abs/2409.20005
Medicine is inherently multimodal and multitask, with diverse data modalities spanning text, imaging. However, most models in medical field are unimodal single tasks and lack good generalizability and explainability. In this study, we introduce MedViLaM, a unified vision-language model towards a generalist model for medical data that can flexibly encode and interpret various forms of medical data, including clinical language and imaging, all using the same set of model weights. To facilitate the creation of such multi-task model, we have curated MultiMedBench, a comprehensive pretaining dataset and benchmark consisting of several distinct tasks, i.e., continuous question-answering, multi-label disease classification, disease localization, generation and summarization of radiology reports. MedViLaM demonstrates strong performance across all MultiMedBench tasks, frequently outpacing other generalist models by a significant margin. Additionally, we present instances of zero-shot generalization to new medical concepts and tasks, effective transfer learning across different tasks, and the emergence of zero-shot medical reasoning.
医学本身具有多模态和多任务特性,涉及多种数据模式,包括文本和图像。然而,医学领域的大多数模型都是单模态单任务,缺乏良好的泛化能力和可解释性。在这项研究中,我们引入了MedViLaM,一个通用的医学数据愿景语言模型,旨在成为一个多任务的一般模型,可以灵活地编码和解释各种形式的医学数据,包括临床语言和图像,所有使用相同的模型权重。为了促进创建这样的多任务模型,我们策划了MultiMedBench,一个由几个不同任务组成的全面维护数据集和基准,即连续问题回答、多标签疾病分类、疾病定位、生成和总结放射学报告。MedViLaM在所有MultiMedBench任务中都表现出强大的性能,经常显著领先于其他通用模型。此外,我们还展示了零散分布到新医疗概念和任务上的零散学习,在不同任务间的有效迁移学习,以及零散医疗推理的出现。
https://arxiv.org/abs/2409.19684
In research findings, co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. The ability to predict 1p19q status is critical for treatment planning and patient follow-up. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection. Although public networks such as RestNet and AlexNet can effectively diagnose brain cancers using transfer learning, the model includes quite a few weights that have nothing to do with medical images. As a result, the diagnostic results are unreliable by the transfer learning model. To deal with the problem of trustworthiness, we create the model from the ground up, rather than depending on a pre-trained model. To enable flexibility, we combined convolution stacking with a dropout and full connect operation, it improved performance by reducing overfitting. During model training, we also supplement the given dataset and inject Gaussian noise. We use three--fold cross-validation to train the best selection model. Comparing InceptionV3, VGG16, and MobileNetV2 fine-tuned with pre-trained models, our model produces better results. On an validation set of 125 codeletion vs. 31 not codeletion images, the proposed network achieves 96.37\% percent F1-score, 97.46\% percent precision, and 96.34\% percent recall when classifying 1p/19q codeletion and not codeletion images.
在研究结果中,1p/19q基因的联合缺失与低级别胶质瘤的临床结果有关。预测1p19q状态对于治疗规划和患者随访至关重要。本研究旨在利用一种特别基于MRI的卷积神经网络进行脑癌检测。尽管像RestNet和AlexNet这样的公共网络通过迁移学习有效诊断脑癌,但该模型包括相当数量的权重与医学图像无关。因此,通过迁移学习模型的诊断结果不可靠。为了解决可信度问题,我们从头构建了该模型,而不是依赖于预训练模型。为了实现灵活性,我们将卷积堆叠与 dropout 和全连接操作相结合,该操作通过减少过拟合提高了性能。在模型训练期间,我们还补充了给定数据集并注入高斯噪声。我们使用三倍交叉验证来训练最佳选择模型。与使用预训练模型的InceptionV3、VGG16和MobileNetV2相比,我们的模型在分类1p/19q胶质瘤和非胶质瘤的验证集上产生更好的结果。在125个缺失与31个非缺失图像的验证集上,所提出的网络在分类1p/19q胶质瘤和非胶质瘤时,F1分数为96.37%,精确度为97.46%,召回率为96.34%。
https://arxiv.org/abs/2409.19583
X-ray absorption spectroscopy (XAS) is a powerful characterization technique for probing the local chemical environment of absorbing atoms. However, analyzing XAS data presents with significant challenges, often requiring extensive, computationally intensive simulations, as well as significant domain expertise. These limitations hinder the development of fast, robust XAS analysis pipelines that are essential in high-throughput studies and for autonomous experimentation. We address these challenges with a suite of transfer learning approaches for XAS prediction, each uniquely contributing to improved accuracy and efficiency, as demonstrated on K-edge spectra database covering eight 3d transition metals (Ti-Cu). Our framework is built upon three distinct strategies. First, we use M3GNet to derive latent representations of the local chemical environment of absorption sites as input for XAS prediction, achieving up to order-of-magnitude improvements over conventional featurization techniques. Second, we employ a hierarchical transfer learning strategy, training a universal multi-task model across elements before fine-tuning for element-specific predictions. This cascaded approach after element-wise fine-turning yields models that outperform element-specific models by up to 31\%. Third, we implement cross-fidelity transfer learning, adapting a universal model to predict spectra generated by simulation of a different fidelity with a much higher computational cost. This approach improves prediction accuracy by up to 24\% over models trained on the target fidelity alone. Our approach is extendable to XAS prediction for a broader range of elements and offers a generalizable transfer learning framework to enhance other deep-learning models in materials science.
X射线吸收光谱(XAS)是一种强大的表征吸收原子的局部化学环境的有效技术。然而,分析XAS数据的挑战相当大,通常需要进行广泛的计算密集型模拟,以及具有显著的专业知识。这些限制阻碍了开发快速、稳健的XAS分析流水线,这对高通量研究和自主实验至关重要。我们通过一系列迁移学习方法来解决这些挑战,每种方法都在提高精度和效率方面做出了独特的贡献,如覆盖八个3D transition金属(Ti-Cu)的K边缘光谱数据库所示。我们的框架基于三种不同的策略。首先,我们使用M3GNet从吸收原子的局部化学环境的有向图表示中提取潜在表示,实现了与传统特征化技术相比 orders of magnitude 的改进。其次,我们采用层次迁移学习策略,在元素级别之前对元素进行训练,然后在元素级别进行微调以预测具有更高精度的元素特定预测。这种级联方法在元素级别微调后,模型的性能优于元素特定模型,提高了 31%。第三,我们实现了跨灵敏度迁移学习,将通用模型适应于预测具有更高计算成本的模拟产生的光谱。这种方法在仅基于目标灵敏度的模型上训练时,将预测精度提高至 24%。我们的方法可以扩展到对更广泛的元素的XAS预测,并为材料科学中的其他深度学习模型提供一种通用的迁移学习框架,以提高其性能。
https://arxiv.org/abs/2409.19552
The escalating frequency and scale of recent malware attacks underscore the urgent need for swift and precise malware classification in the ever-evolving cybersecurity landscape. Key challenges include accurately categorizing closely related malware families. To tackle this evolving threat landscape, this paper proposes a novel architecture LeViT-MC which produces state-of-the-art results in malware detection and classification. LeViT-MC leverages a vision transformer-based architecture, an image-based visualization approach, and advanced transfer learning techniques. Experimental results on multi-class malware classification using the MaleVis dataset indicate LeViT-MC's significant advantage over existing models. This study underscores the critical importance of combining image-based and transfer learning techniques, with vision transformers at the forefront of the ongoing battle against evolving cyber threats. We propose a novel architecture LeViT-MC which not only achieves state of the art results on image classification but is also more time efficient.
近年来恶意软件攻击的频率和规模不断加剧,凸显了在不断演变的网络安全环境中,迅速而精确地对恶意软件进行分类的迫切需求。关键挑战包括准确地将密切相关的恶意软件家族分类。为应对不断演变的威胁格局,本文提出了一个名为LeViT-MC的新架构,该架构在恶意软件检测和分类方面实现了最先进的结果。LeViT-MC利用了基于视觉Transformer的架构、基于图像的可视化方法和先进的迁移学习技术。使用MaleVis数据集的多分类恶意软件分类实验结果表明,LeViT-MC相对于现有模型具有显著优势。本研究强调了将图像技术和迁移学习技术相结合的重要性,视觉Transformer在对抗不断演变的网络安全威胁中处于领先地位。我们提出了一个名为LeViT-MC的新架构,该架构在实现图像分类的最先进结果的同时,还更加高效。
https://arxiv.org/abs/2409.19461
We explore the universality of neural encodings in convolutional neural networks trained on image classification tasks. We develop a procedure to directly compare the learned weights rather than their representations. It is based on a factorization of spatial and channel dimensions and measures the similarity of aligned weight covariances. We show that, for a range of layers of VGG-type networks, the learned eigenvectors appear to be universal across different natural image datasets. Our results suggest the existence of a universal neural encoding for natural images. They explain, at a more fundamental level, the success of transfer learning. Our work shows that, instead of aiming at maximizing the performance of neural networks, one can alternatively attempt to maximize the universality of the learned encoding, in order to build a principled foundation model.
我们研究了在图像分类任务上训练的卷积神经网络中神经编码的普适性。我们开发了一个直接比较学习到的权重而不是它们的代表的程序。它基于空间和通道维度的分解,并衡量对齐的权重协方差相似性。我们证明了,对于VGG类型的网络层,学习到的向量似乎在不同的自然图像数据集中是普遍的。我们的结果表明,对于一系列VGG类型的网络层,学习到的向量似乎在不同的自然图像数据集中是普遍的。我们的工作表明,可以尝试通过尝试最大化学习编码的普遍性来替代最大化神经网络的性能,以构建一个有原则的基模型。
https://arxiv.org/abs/2409.19460
Meta learning has been widely used to exploit rich-resource source tasks to improve the performance of low-resource target tasks. Unfortunately, most existing meta learning approaches treat different source tasks equally, ignoring the relatedness of source tasks to the target task in knowledge transfer. To mitigate this issue, we propose a reinforcement-based multi-source meta-transfer learning framework (Meta-RTL) for low-resource commonsense reasoning. In this framework, we present a reinforcement-based approach to dynamically estimating source task weights that measure the contribution of the corresponding tasks to the target task in the meta-transfer learning. The differences between the general loss of the meta model and task-specific losses of source-specific temporal meta models on sampled target data are fed into the policy network of the reinforcement learning module as rewards. The policy network is built upon LSTMs that capture long-term dependencies on source task weight estimation across meta learning iterations. We evaluate the proposed Meta-RTL using both BERT and ALBERT as the backbone of the meta model on three commonsense reasoning benchmark datasets. Experimental results demonstrate that Meta-RTL substantially outperforms strong baselines and previous task selection strategies and achieves larger improvements on extremely low-resource settings.
元学习已被广泛应用于从丰富资源源任务中利用到低资源目标任务的性能提升。然而,大多数现有的元学习方法将不同的源任务平等对待,忽略了源任务与目标任务之间的相关性。为了减轻这个问题,我们提出了一个基于强化学习的多源元转移学习框架(Meta-RTL)用于低资源常识推理。在这个框架中,我们提出了一个基于强化的方法来动态估计源任务权重,该权重衡量了相应任务对目标任务的贡献。元模型的一般损失和特定源元模型的任务特定损失在随机目标数据上的差异被输入到元学习模型的策略网络作为奖励。策略网络基于LSTM,捕捉了元学习迭代过程中源任务权重估计的长远依赖关系。我们使用BERT和ALBERT作为元模型的元核,在三个常识推理基准数据集上评估所提出的Meta-RTL。实验结果表明,Meta-RTL在很大程度上超过了强大的基线和以前的任务选择策略,并在极度低资源设置上实现了更大的改进。
https://arxiv.org/abs/2409.19075
The difficulty of acquiring abundant, high-quality data, especially in multi-lingual contexts, has sparked interest in addressing low-resource scenarios. Moreover, current literature rely on fixed expressions from language IDs, which results in the inadequate learning of language representations, and the failure to generate speech in unseen languages. To address these challenges, we propose a novel method that directly extracts linguistic features from audio input while effectively filtering out miscellaneous acoustic information including speaker-specific attributes like timbre. Subjective and objective evaluations affirm the effectiveness of our approach for multi-lingual text-to-speech, and highlight its superiority in low-resource transfer learning for previously unseen language.
在多语言环境中获取丰富、高质量数据的努力引发了研究低资源场景的兴趣。此外,当前文献依赖于来自语言标识固定表达,导致语言表示的学习不足,以及在未见过的语言中生成语音的失败。为解决这些挑战,我们提出了一种新颖的方法,直接从音频输入中提取语言特征,同时有效过滤出包括说话者特定属性(如音色)的杂散听觉信息。主观和客观评估证实了我们在多语言文本到语音方面的方法的有效性,并强调了其在低资源迁移学习中的优越性,对 previously unseen 语言。
https://arxiv.org/abs/2409.18622
Self-supervised pre-training has proven highly effective for many computer vision tasks, particularly when labelled data are scarce. In the context of Earth Observation (EO), foundation models and various other Vision Transformer (ViT)-based approaches have been successfully applied for transfer learning to downstream tasks. However, it remains unclear under which conditions pre-trained models offer significant advantages over training from scratch. In this study, we investigate the effectiveness of pre-training ViT-based Masked Autoencoders (MAE) for downstream EO tasks, focusing on reconstruction, segmentation, and classification. We consider two large ViT-based MAE pre-trained models: a foundation model (Prithvi) and SatMAE. We evaluate Prithvi on reconstruction and segmentation-based downstream tasks, and for SatMAE we assess its performance on a classification downstream task. Our findings suggest that pre-training is particularly beneficial when the fine-tuning task closely resembles the pre-training task, e.g. reconstruction. In contrast, for tasks such as segmentation or classification, training from scratch with specific hyperparameter adjustments proved to be equally or more effective.
自监督预训练已经在许多计算机视觉任务中证明高度有效,特别是在标注数据稀缺的情况下。在地球观测(EO)领域,基础模型和各种基于Vision Transformer(ViT)的方法已经成功地应用于下游任务进行迁移学习。然而,在什么情况下预训练模型相对于从头训练具有显著优势仍然不清楚。在这项研究中,我们研究了基于ViT的遮罩自编码器(MAE)在下游EO任务中的效果,重点关注重构、分割和分类。我们考虑两个大的ViT-based MAE预训练模型:基础模型(Prithvi)和SatMAE。我们评估Prithvi在重构和分割-based下游任务上的表现,而对于SatMAE,我们评估其在一个分类下游任务上的表现。我们的研究结果表明,在预训练任务与细调任务紧密相似的情况下,预训练具有特别的好处,例如重构。相反,对于像分割或分类这样的任务,通过针对具体超参数进行调整进行从头训练证明同样有效,甚至可能更有效。
https://arxiv.org/abs/2409.18536