Image segmentation is one of the most fundamental problems in computer vision and has drawn a lot of attentions due to its vast applications in image understanding and autonomous driving. However, designing effective and efficient segmentation neural architectures is a labor-intensive process that may require lots of trials by human experts. In this paper, we address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently, by leveraging architecture search. Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution. By contrast, we develop a multi-target multi-branch supernet method, which not only fully utilizes the advantages of high-resolution features, but also finds the proper location for placing multi-head self-attention module. Our search algorithm is optimized towards multiple objective s (e.g., latency and mIoU) and capable of finding architectures on Pareto frontier with arbitrary number of branches in a single search. We further present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers between branches from different resolutions and fuse to high resolution for both efficiency and effectiveness. Extensive experiments demonstrate that HyCTAS outperforms previous methods on semantic segmentation task. Code and models are available at \url{this https URL}.
图像分割是计算机视觉中最基本的问题之一,由于其在图像理解和自动驾驶中的广泛应用,因此受到了很多关注。然而,设计有效且高效的分割神经架构是一个劳动密集的过程,可能需要许多专家的人工作尝试。在本文中,我们通过利用架构搜索解决了将多头自注意力集成到高分辨率表示CNNs中的问题,通过构建多目标多分支超级网络。通过手动替换卷积层为多头自注意力,由于需要高昂的内存开销来维持高分辨率,因此这是不可能的。相反,我们开发了一种多目标多分支超级网络方法,不仅充分利用了高分辨率特征的优势,而且发现了放置多头自注意力的适当位置。我们的搜索算法针对多个目标(如延迟和mIoU)进行优化,可以在一个搜索中找到架构在帕累托前沿的任意数量分支上的最优架构。我们还通过HyCTAS方法展示了一系列模型,该方法在寻找不同分辨率分支的最佳轻量级卷积层和内存高效的自注意力层之间进行了搜索,将高分辨率与效率进行了平衡。大量实验证明,HyCTAS在语义分割任务上优于以前的算法。代码和模型可在此处访问:\url{这个链接}。
https://arxiv.org/abs/2403.10413
How well the heart is functioning can be quantified through measurements of myocardial deformation via echocardiography. Clinical assessment of cardiac function is generally focused on global indices of relative shortening, however, territorial, and segmental strain indices have shown to be abnormal in regions of myocardial disease, such as scar. In this work, we propose a single framework to predict myocardial disease substrates at global, territorial, and segmental levels using regional myocardial strain traces as input to a convolutional neural network (CNN)-based classification algorithm. An anatomically meaningful representation of the input data from the clinically standard bullseye representation to a multi-channel 2D image is proposed, to formulate the task as an image classification problem, thus enabling the use of state-of-the-art neural network configurations. A Fully Convolutional Network (FCN) is trained to detect and localize myocardial scar from regional left ventricular (LV) strain patterns. Simulated regional strain data from a controlled dataset of virtual patients with varying degrees and locations of myocardial scar is used for training and validation. The proposed method successfully detects and localizes the scars on 98% of the 5490 left ventricle (LV) segments of the 305 patients in the test set using strain traces only. Due to the sparse existence of scar, only 10% of the LV segments in the virtual patient cohort have scar. Taking the imbalance into account, the class balanced accuracy is calculated as 95%. The performance is reported on global, territorial, and segmental levels. The proposed method proves successful on the strain traces of the virtual cohort and offers the potential to solve the regional myocardial scar detection problem on the strain traces of the real patient cohorts.
心脏功能的好坏可以通过通过超声心动图测量的心肌变形来量化。通常,临床评估心脏功能关注的是相对缩短的指标,然而,领域和片段应变指标在心肌病区域异常,例如瘢痕。在这项工作中,我们提出了一个用于预测基于卷积神经网络(CNN)的全面心肌病亚临床水平的方法,该方法使用区域心肌应变迹作为输入来训练卷积神经网络分类算法。我们提出了一个解剖学上有意义的心肌输入数据的从一个标准的 bullseye表示到多通道 2D 图像的转换,将任务表述为图像分类问题,从而使最先进的神经网络配置得以应用。 完全卷积网络(FCN)从区域左心室(LV)应变模式中训练来检测和定位心肌瘢痕。用于训练和验证的模拟区域应变数据来自于具有不同程度和位置心肌瘢痕的虚拟患者数据集。只使用应变迹成功检测和定位了测试集中305名患者中的98%的左心室(LV)段的瘢痕。由于瘢痕存在稀疏性,虚拟患者队列中只有10%的LV段有瘢痕。考虑到不平衡,计算得到分类平衡准确度为95%。在全局、领土和片段水平上报告性能。我们的方法在虚拟队列的应变迹上表现成功,并有望解决实际患者队列中区域心肌瘢痕检测问题。
https://arxiv.org/abs/2403.10291
In today's technology-driven era, the imperative for predictive maintenance and advanced diagnostics extends beyond aviation to encompass the identification of damages, failures, and operational defects in rotating and moving machines. Implementing such services not only curtails maintenance costs but also extends machine lifespan, ensuring heightened operational efficiency. Moreover, it serves as a preventive measure against potential accidents or catastrophic events. The advent of Artificial Intelligence (AI) has revolutionized maintenance across industries, enabling more accurate and efficient prediction and analysis of machine failures, thereby conserving time and resources. Our proposed study aims to delve into various machine learning classification techniques, including Support Vector Machine (SVM), Random Forest, Logistic Regression, and Convolutional Neural Network LSTM-Based, for predicting and analyzing machine performance. SVM classifies data into different categories based on their positions in a multidimensional space, while Random Forest employs ensemble learning to create multiple decision trees for classification. Logistic Regression predicts the probability of binary outcomes using input data. The primary objective of the study is to assess these algorithms' performance in predicting and analyzing machine performance, considering factors such as accuracy, precision, recall, and F1 score. The findings will aid maintenance experts in selecting the most suitable machine learning algorithm for effective prediction and analysis of machine performance.
在当今以技术驱动的时代, predictive维护和高级诊断的需求已经超越了航空,涉及到了旋转和移动机器的损伤、故障和操作缺陷的识别。实施这些服务不仅降低了维护成本,而且延长了机器的寿命,确保了更高的操作效率。此外,它还是一种预防性措施,以应对可能的事故或灾难。人工智能(AI)的出现已经彻底颠覆了各个行业的维护,使得机器故障的预测和分析更加准确和高效,从而节省了时间和资源。我们提出的研究旨在探讨各种机器学习分类技术,包括支持向量机(SVM)、随机森林、逻辑回归和卷积神经网络LSTM-Based,以预测和分析机器性能。SVM根据数据在多维空间中的位置将其分类为不同的类别,而随机森林采用集成学习来创建多个分类决策树。逻辑回归通过输入数据预测二进制结果。本研究的主要目标是评估这些算法的预测和分析机器性能,包括准确性、精度、召回率和F1分数。研究结果将帮助维护专家在选择最合适的机器学习算法以有效预测和分析机器性能方面提供指导。
https://arxiv.org/abs/2403.10259
This paper introduces a novel Functional Graph Convolutional Network (funGCN) framework that combines Functional Data Analysis and Graph Convolutional Networks to address the complexities of multi-task and multi-modal learning in digital health and longitudinal studies. With the growing importance of health solutions to improve health care and social support, ensure healthy lives, and promote well-being at all ages, funGCN offers a unified approach to handle multivariate longitudinal data for multiple entities and ensures interpretability even with small sample sizes. Key innovations include task-specific embedding components that manage different data types, the ability to perform classification, regression, and forecasting, and the creation of a knowledge graph for insightful data interpretation. The efficacy of funGCN is validated through simulation experiments and a real-data application.
本文提出了一种新颖的函数图卷积网络(funGCN)框架,将功能数据分析和图卷积网络相结合,以解决数字健康和纵向研究中的多任务和多模态学习复杂性。随着医疗解决方案在改善医疗保健和社会支持、确保健康生活和促进各年龄段的健康益处方面的重要性不断增加,funGCN为多个实体处理多维纵向数据提供了一个统一的解决方案,并且即使样本量较小,也能确保可解释性。关键创新包括特定任务嵌入组件,用于管理不同数据类型,进行分类、回归和预测的能力,以及创建了用于洞察数据解释的知识图。funGCN的有效性通过仿真实验和实际应用得到了验证。
https://arxiv.org/abs/2403.10158
The standard approach to tackling computer vision problems is to train deep convolutional neural network (CNN) models using large-scale image datasets which are representative of the target task. However, in many scenarios, it is often challenging to obtain sufficient image data for the target task. Data augmentation is a way to mitigate this challenge. A common practice is to explicitly transform existing images in desired ways so as to create the required volume and variability of training data necessary to achieve good generalization performance. In situations where data for the target domain is not accessible, a viable workaround is to synthesize training data from scratch--i.e., synthetic data augmentation. This paper presents an extensive review of synthetic data augmentation techniques. It covers data synthesis approaches based on realistic 3D graphics modeling, neural style transfer (NST), differential neural rendering, and generative artificial intelligence (AI) techniques such as generative adversarial networks (GANs) and variational autoencoders (VAEs). For each of these classes of methods, we focus on the important data generation and augmentation techniques, general scope of application and specific use-cases, as well as existing limitations and possible workarounds. Additionally, we provide a summary of common synthetic datasets for training computer vision models, highlighting the main features, application domains and supported tasks. Finally, we discuss the effectiveness of synthetic data augmentation methods. Since this is the first paper to explore synthetic data augmentation methods in great detail, we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.
解决计算机视觉问题的标准方法是使用大型图像数据集训练深度卷积神经网络(CNN)模型,这些数据集代表目标任务。然而,在许多情况下,获得足够的目标任务图像数据具有挑战性。数据增强是一种减轻这一挑战的方法。一种常见的做法是对现有的图像进行显式转换,以便创建实现良好泛化性能所需的训练数据量。在目标领域数据不可访问的情况下,一个可行的解决方法是从零开始合成训练数据,即合成数据增强。 本文对合成数据增强技术进行了全面的回顾。它涵盖了基于现实3D图形建模的数据生成方法、神经风格迁移(NST)、差分神经渲染和生成人工智能(AI)技术(如生成对抗网络(GANs)和变分自编码器(VAEs)的数据生成方法。对于每种方法,我们重点关注重要的数据生成和增强技术、应用范围和具体用例,以及现有的局限性和可能的解决方案。此外,我们还提供了用于训练计算机视觉模型的常见合成数据集的总结,突出了主要特点、应用领域和支持任务。最后,我们讨论了合成数据增强方法的有效性。由于这是对详细探索合成数据增强方法的第一篇论文,我们希望能够为读者提供必要的背景信息和现有方法的深入知识及其相关问题。
https://arxiv.org/abs/2403.10075
Hyperspectral image (HSI) denoising is critical for the effective analysis and interpretation of hyperspectral data. However, simultaneously modeling global and local features is rarely explored to enhance HSI denoising. In this letter, we propose a hybrid convolution and attention network (HCANet), which leverages both the strengths of convolution neural networks (CNNs) and Transformers. To enhance the modeling of both global and local features, we have devised a convolution and attention fusion module aimed at capturing long-range dependencies and neighborhood spectral correlations. Furthermore, to improve multi-scale information aggregation, we design a multi-scale feed-forward network to enhance denoising performance by extracting features at different scales. Experimental results on mainstream HSI datasets demonstrate the rationality and effectiveness of the proposed HCANet. The proposed model is effective in removing various types of complex noise. Our codes are available at \url{this https URL}.
超光谱图像(HSI)去噪对于有效分析和解释超光谱数据至关重要。然而,同时建模全局和局部特征通常是一个很少被研究的问题,以增强HSI去噪效果。在本文中,我们提出了一种混合卷积和注意网络(HCANet),它利用了卷积神经网络(CNN)和Transformer的优点。为了增强对全局和局部特征的建模,我们设计了一个旨在捕捉长距离依赖关系和邻接光谱关联的卷积和注意合并模块。此外,为了提高多尺度信息聚合,我们设计了一个多尺度前馈网络,通过提取不同尺度的特征来提高去噪性能。在主流HSI数据集上进行实验结果表明,与预设相比,所提出的HCANet具有合理性和有效性。所提出的模型对于消除各种类型的复杂噪声非常有效。我们的代码可在此处访问:\url{https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https:// this <https://
https://arxiv.org/abs/2403.10067
Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utterances and the semantic interaction among them. However, previous studies have primarily focused on coarse-grained relationships between utterances, thus overlooking the potential benefits of detailed intra-utterance syntactic information and the granularity of inter-utterance relationships. This paper introduces the Triple GNNs network to enhance DiaAsQ. It employs a Graph Convolutional Network (GCN) for modeling syntactic dependencies within utterances and a Dual Graph Attention Network (DualGATs) to construct interactions between utterances. Experiments on two standard datasets reveal that our model significantly outperforms state-of-the-art baselines. The code is available at \url{this https URL}.
对话情感分析(DiaASQ)旨在从给定的对话中检测四元组:目标(target)、观点(aspect)、意见(opinion)和情感极化(sentiment polarity)。在DiaASQ中,构成这些四元组的元素并不一定局限于单个句子,而是可以跨越对话中的多个时刻。这就需要将双重视角放在个体句子的语义信息上和它们之间的语义相互作用上。然而,之前的研究主要关注语义信息之间的粗粒度关系,从而忽略了详细分析句子之间的语义互动以及对话中句子之间的 granularity。本文引入了三元组注意力网络(Triple GNNs)以增强DiaASQ。它采用图卷积网络(GCN)来建模句子内的语义依赖关系,并采用双图注意力网络(DualGATs)来构建句子之间的交互。在两个标准数据集上的实验表明,我们的模型显著优于最先进的基线。代码可在此处下载:\url{this <https:// this URL> }。
https://arxiv.org/abs/2403.10065
This paper presents a novel Fully Binary Point Cloud Transformer (FBPT) model which has the potential to be widely applied and expanded in the fields of robotics and mobile devices. By compressing the weights and activations of a 32-bit full-precision network to 1-bit binary values, the proposed binary point cloud Transformer network significantly reduces the storage footprint and computational resource requirements of neural network models for point cloud processing tasks, compared to full-precision point cloud networks. However, achieving a fully binary point cloud Transformer network, where all parts except the modules specific to the task are binary, poses challenges and bottlenecks in quantizing the activations of Q, K, V and self-attention in the attention module, as they do not adhere to simple probability distributions and can vary with input data. Furthermore, in our network, the binary attention module undergoes a degradation of the self-attention module due to the uniform distribution that occurs after the softmax operation. The primary focus of this paper is on addressing the performance degradation issue caused by the use of binary point cloud Transformer modules. We propose a novel binarization mechanism called dynamic-static hybridization. Specifically, our approach combines static binarization of the overall network model with fine granularity dynamic binarization of data-sensitive components. Furthermore, we make use of a novel hierarchical training scheme to obtain the optimal model and binarization parameters. These above improvements allow the proposed binarization method to outperform binarization methods applied to convolution neural networks when used in point cloud Transformer structures. To demonstrate the superiority of our algorithm, we conducted experiments on two different tasks: point cloud classification and place recognition.
本文提出了一种新颖的全二进制点云Transformer(FBPT)模型,该模型在机器人学和移动设备领域具有广泛的应用和扩展潜力。通过将32位全精度网络的权重和激活压缩至1位二进制值,所提出的二进制点云Transformer网络显著减少了用于点云处理任务的神经网络模型的存储空间和计算资源需求,与全精度点云网络相比。然而,实现一个完全二进制点云Transformer网络,其中所有模块都为二进制,在量化注意力的Q、K、V和自注意力模块方面存在挑战和瓶颈,因为它们不遵循简单的概率分布,并且可能随输入数据而变化。此外,在我们的网络中,由于软max操作后出现的均匀分布,二进制注意模块导致自注意力模块的降维。本文的主要目标是在使用二进制点云Transformer模块时解决性能下降问题。我们提出了名为动态-静态分层混合的新的二进制化机制。具体来说,我们的方法将整个网络模型的静态二进制化与数据敏感组件的精细颗粒度动态二进制化相结合。此外,我们还采用了一种新的分层训练方案来获得最优的模型和二进制参数。这些改进使得所提出的二进制化方法在应用于点云Transformer结构时能够优于应用于卷积神经网络的二进制化方法。为了证明我们算法的优越性,我们在两个不同任务上进行了实验:点云分类和地点识别。
https://arxiv.org/abs/2403.09998
We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.
我们提出了GazeMotion,一种结合过去人类姿态信息和人类眼部目光信息的人体运动预测新方法。受到行为科学证据表明人类眼睛和身体动作密切相关启发,GazeMotion首先预测过去 gaze 中的未来眼部运动,然后将预测的未来眼部运动和过去姿态融合成一个目光-姿态图,最后使用残差图卷积网络预测身体运动。我们对 MoGaze、ADT 和 GIMO 基准数据集进行了广泛的评估,并发现该方法在平均每个关节位置误差方面优于最先进的方法,提高了7.4%的性能。使用头方向作为眼部运动的代理,我们的方法仍然实现了平均提高5.5%的性能。最后,我们报告了一个在线用户调查结果,表明我们的方法在感知真实性的方面也优于先前的方法。这些结果表明,眼部目光对于人体运动预测具有重要信息价值,并且我们的方法有效地利用了这些信息。
https://arxiv.org/abs/2403.09885
Uninterrupted optical image time series are crucial for the timely monitoring of agricultural land changes. However, the continuity of such time series is often disrupted by clouds. In response to this challenge, we propose a deep learning method that integrates cloud-free optical (Sentinel-2) observations and weather-independent (Sentinel-1) Synthetic Aperture Radar (SAR) data, using a combined Convolutional Neural Network (CNN)-Recurrent Neural Network (RNN) architecture to generate continuous Normalized Difference Vegetation Index (NDVI) time series. We emphasize the significance of observation continuity by assessing the impact of the generated time series on the detection of grassland mowing events. We focus on Lithuania, a country characterized by extensive cloud coverage, and compare our approach with alternative interpolation techniques (i.e., linear, Akima, quadratic). Our method surpasses these techniques, with an average MAE of 0.024 and R^2 of 0.92. It not only improves the accuracy of event detection tasks by employing a continuous time series, but also effectively filters out sudden shifts and noise originating from cloudy observations that cloud masks often fail to detect.
无法中断的光学图像时间序列对于及时监测农业土地变化至关重要。然而,这样的时间序列常常受到云层的干扰。为了应对这一挑战,我们提出了一个结合云免费光学(Sentinel-2)观测和天气独立(Sentinel-1)合成孔径雷达(SAR)数据,使用结合卷积神经网络(CNN)-循环神经网络(RNN)架构生成连续归一化差异植被指数(NDVI)时间序列的方法。我们强调通过评估生成的序列对 grassland 割草事件检测的影响,观察连续性的重要性。我们专注于 Lithuania,一个被广泛覆盖云层的国家,并与替代插值技术(即线性、Akima、二次方)进行比较。我们的方法超越了这些技术,平均最大平均误差(MAE)为0.024,相关系数(R^2)为0.92。它不仅通过使用连续时间序列提高了事件检测任务的准确性,而且有效地滤除了云层观测中产生的突然变化和噪声,云层遮挡往往无法检测到。
https://arxiv.org/abs/2403.09554
Skeleton-based action recognition, which classifies human actions based on the coordinates of joints and their connectivity within skeleton data, is widely utilized in various scenarios. While Graph Convolutional Networks (GCNs) have been proposed for skeleton data represented as graphs, they suffer from limited receptive fields constrained by joint connectivity. To address this limitation, recent advancements have introduced transformer-based methods. However, capturing correlations between all joints in all frames requires substantial memory resources. To alleviate this, we propose a novel approach called Skeletal-Temporal Transformer (SkateFormer) that partitions joints and frames based on different types of skeletal-temporal relation (Skate-Type) and performs skeletal-temporal self-attention (Skate-MSA) within each partition. We categorize the key skeletal-temporal relations for action recognition into a total of four distinct types. These types combine (i) two skeletal relation types based on physically neighboring and distant joints, and (ii) two temporal relation types based on neighboring and distant frames. Through this partition-specific attention strategy, our SkateFormer can selectively focus on key joints and frames crucial for action recognition in an action-adaptive manner with efficient computation. Extensive experiments on various benchmark datasets validate that our SkateFormer outperforms recent state-of-the-art methods.
基于骨架的动作识别,这种方法根据骨架数据中关节的坐标及其骨架数据中的连接来对人类动作进行分类。虽然已经提出了用图卷积网络(GCNs)表示的骨架数据,但这些网络存在有限的感受野,受到关节连接的限制。为了应对这个限制,最近的研究引入了基于Transformer的方法。然而,在捕捉所有关节的所有帧之间的相关性时,需要大量的内存资源。为了减轻这个缺陷,我们提出了一个新的方法,称为骨骼-时间Transformer(SkateFormer),它根据不同类型的骨架-时间关系(Skate-Type)对关节和帧进行分 partition,并在每个 partition 内执行骨架-时间自注意(Skate-MSA)。我们将动作识别的关键骨架-时间关系分为四种不同的类型。这些类型结合了(i)基于物理邻近和远距离关节的两种骨架关系类型,和(ii)基于邻近和远距离帧的两种时间关系类型。通过这种分区特定的关注策略,我们的SkateFormer可以在以高效计算的方式选择性地关注关键关节和帧,实现动作自适应的动作识别。在各种基准数据集上的大量实验证实,我们的SkateFormer超越了最近的最先进方法。
https://arxiv.org/abs/2403.09508
Recent advancements in remote sensing (RS) technologies have shown their potential in accurately classifying local climate zones (LCZs). However, traditional scene-level methods using convolutional neural networks (CNNs) often struggle to integrate prior knowledge of ground objects effectively. Moreover, commonly utilized data sources like Sentinel-2 encounter difficulties in capturing detailed ground object information. To tackle these challenges, we propose a data fusion method that integrates ground object priors extracted from high-resolution Google imagery with Sentinel-2 multispectral imagery. The proposed method introduces a novel Dual-stream Fusion framework for LCZ classification (DF4LCZ), integrating instance-based location features from Google imagery with the scene-level spatial-spectral features extracted from Sentinel-2 imagery. The framework incorporates a Graph Convolutional Network (GCN) module empowered by the Segment Anything Model (SAM) to enhance feature extraction from Google imagery. Simultaneously, the framework employs a 3D-CNN architecture to learn the spectral-spatial features of Sentinel-2 imagery. Experiments are conducted on a multi-source remote sensing image dataset specifically designed for LCZ classification, validating the effectiveness of the proposed DF4LCZ. The related code and dataset are available at this https URL.
近年来,遥感技术(RS)的进步在准确分类局部气候区(LCZ)方面显示出其潜力。然而,传统使用卷积神经网络(CNN)的场景级方法往往难以有效地整合先验知识。此外,常用的数据源如Sentinel-2在捕捉详细地面物体信息时也遇到困难。为了应对这些挑战,我们提出了一个数据融合方法,将高分辨率Google图像中提取的地面物体先验知识与Sentinel-2多光谱图像中的场景级空间光谱特征相结合。所提出的方法引入了一种新颖的Dual-stream Fusion框架(DF4LCZ)用于LCZ分类,将Google图像中的实例基于位置的特征与Sentinel-2图像中的场景级空间光谱特征相结合。该框架利用了由Segment Anything Model(SAM)提供的图卷积网络(GCN)模块,从而增强Google图像中特征的提取。同时,该框架采用3D-CNN架构从Sentinel-2图像中学习光谱-空间特征。在为LCZ分类而设计的多个来源的遥感图像数据集上进行实验,验证了所提出的DF4LCZ的有效性。相关代码和数据集可在此链接处获得:https://url.in/file/202222211033492023612
https://arxiv.org/abs/2403.09367
Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in optimizing scan directions for sequence modeling. Traditional ViM approaches, which flatten spatial tokens, overlook the preservation of local 2D dependencies, thereby elongating the distance between adjacent tokens. We introduce a novel local scanning strategy that divides images into distinct windows, effectively capturing local dependencies while maintaining a global perspective. Additionally, acknowledging the varying preferences for scan patterns across different network layers, we propose a dynamic method to independently search for the optimal scan choices for each layer, substantially improving performance. Extensive experiments across both plain and hierarchical models underscore our approach's superiority in effectively capturing image representations. For example, our model significantly outperforms Vim-Ti by 3.1% on ImageNet with the same 1.5G FLOPs. Code is available at: this https URL.
近年来,在状态空间模型(State Space Models)方面的进步,特别是Mamba,在诸如语言理解等任务中取得了显著的进展。然而,在视觉任务中,它们的应用并没有明显超越传统卷积神经网络(CNNs)和视觉变压器(ViTs)的性能。本文认为,提高视觉Mamba(ViM)的关键在于优化序列建模的扫描方向。传统的ViM方法平铺空间权重,忽视了局部2D依赖关系的保留,从而延长了相邻词之间的距离。我们引入了一种新颖的局部扫描策略,将图像划分为不同的窗口,在保持全局视图的同时有效捕捉局部依赖关系。此外,考虑到不同网络层对扫描模式的需求存在差异,我们提出了一个动态方法,独立搜索每个层的最佳扫描选择,从而显著提高性能。在plain和hierarchical模型之间进行广泛的实验,都证实了我们在捕捉图像表示方面的优越性。例如,与Vim-Ti相比,我们的模型在ImageNet上具有相同的1.5G FLOPs时,性能提高了3.1%。代码可在此处下载:https://this URL。
https://arxiv.org/abs/2403.09338
Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing multimodal data and information interaction within modalities. This paper introduces SELECTOR, a heterogeneous graph-aware network based on convolutional mask encoders for robust multimodal prediction of cancer patient survival. SELECTOR comprises feature edge reconstruction, convolutional mask encoder, feature cross-fusion, and multimodal survival prediction modules. Initially, we construct a multimodal heterogeneous graph and employ the meta-path method for feature edge reconstruction, ensuring comprehensive incorporation of feature information from graph edges and effective embedding of nodes. To mitigate the impact of missing features within the modality on prediction accuracy, we devised a convolutional masked autoencoder (CMAE) to process the heterogeneous graph post-feature reconstruction. Subsequently, the feature cross-fusion module facilitates communication between modalities, ensuring that output features encompass all features of the modality and relevant information from other modalities. Extensive experiments and analysis on six cancer datasets from TCGA demonstrate that our method significantly outperforms state-of-the-art methods in both modality-missing and intra-modality information-confirmed cases. Our codes are made available at this https URL.
准确预测癌症患者的生存率对医生规划适当的治疗、降低癌症相关医疗费用以及显著提高患者的生活质量至关重要。多模态预测癌症患者生存提供了一个更全面和精确的方法。然而,现有的方法仍然面临着与缺失多模态数据和信息在模态内相互作用相关的挑战。本文介绍了一个基于卷积掩码编码器的异构图感知网络SELECTOR,用于稳健的多模态预测癌症患者生存。SELECTOR包括特征边缘重构、卷积掩码编码器、特征跨融合和多模态生存预测模块。首先,我们构建了一个多模态异质图,并使用元路径方法进行特征边缘重构,确保全面包含图形边界的特征信息并有效地将节点嵌入。为了减轻模态内缺失特征对预测准确性的影响,我们设计了一个卷积掩码自编码器(CMAE)进行后特征重构。随后,特征跨融合模块促进模态之间的通信,确保输出特征涵盖模态的所有特征以及其他模态的相关信息。对TCGA中的六个癌症数据集的广泛实验和分析表明,我们的方法在模态缺失和模态内信息确认情况下显著超过了最先进的 methods。我们的代码可在此链接处获取:https://url.cn/
https://arxiv.org/abs/2403.09290
Capsule Neural Networks (CapsNets) is a novel architecture that utilizes vector-wise representations formed by multiple neurons. Specifically, the Dynamic Routing CapsNets (DR-CapsNets) employ an affine matrix and dynamic routing mechanism to train capsules and acquire translation-equivariance properties, enhancing its robustness compared to traditional Convolutional Neural Networks (CNNs). Echocardiograms, which capture moving images of the heart, present unique challenges for traditional image classification methods. In this paper, we explore the potential of DR-CapsNets and propose CardioCaps, a novel attention-based DR-CapsNet architecture for class-imbalanced echocardiogram classification. CardioCaps comprises two key components: a weighted margin loss incorporating a regression auxiliary loss and an attention mechanism. First, the weighted margin loss prioritizes positive cases, supplemented by an auxiliary loss function based on the Ejection Fraction (EF) regression task, a crucial measure of cardiac function. This approach enhances the model's resilience in the face of class imbalance. Second, recognizing the quadratic complexity of dynamic routing leading to training inefficiencies, we adopt the attention mechanism as a more computationally efficient alternative. Our results demonstrate that CardioCaps surpasses traditional machine learning baseline methods, including Logistic Regression, Random Forest, and XGBoost with sampling methods and a class weight matrix. Furthermore, CardioCaps outperforms other deep learning baseline methods such as CNNs, ResNets, U-Nets, and ViTs, as well as advanced CapsNets methods such as EM-CapsNets and Efficient-CapsNets. Notably, our model demonstrates robustness to class imbalance, achieving high precision even in datasets with a substantial proportion of negative cases.
胶囊神经网络(CapsNets)是一种新架构,利用多个神经元产生的向量级表示。具体来说,动态路由胶囊网络(DR-CapsNets)采用一个凸函数和一个动态路由机制来训练胶囊并获得平移等价性质,提高了其对传统卷积神经网络(CNNs)的鲁棒性。超声心动图(Echocardiograms)捕获了心脏的动态图像,这使得传统图像分类方法面临独特的挑战。在本文中,我们探讨了DR-CapsNets的潜力,并提出了CardioCaps,一种基于注意力的DR-CapsNet架构,用于类不平衡超声心动图分类。CardioCaps包括两个关键组件:一个加权边距损失,包括基于回归辅助损失的附加损失函数和注意机制。首先,加权边距损失优先考虑阳性案例,由Ejection Fraction(EF)回归任务为基础的辅助损失函数提供支持,这是评估心脏功能的关键指标。这种方法增强了模型在类不平衡情况下的鲁棒性。其次,考虑到动态路由导致的训练效率问题,我们采用注意机制作为计算效率更高的一种替代方法。我们的结果表明,CardioCaps超越了传统的机器学习基线方法,包括Logistic回归、随机森林和XGBoost等采样方法和类权重矩阵。此外,CardioCaps还优于其他深度学习基线方法,如CNNs、ResNets、U-Nets和ViTs,以及先进的CapsNets方法,如EM-CapsNets和Efficient-CapsNets。值得注意的是,我们的模型表现出对类不平衡的鲁棒性,即使在数据集中有大量负例的情况下,仍能达到高精度。
https://arxiv.org/abs/2403.09108
This study aimed to develop a deep learning model for the classification of bearing faults in wind turbine generators from acoustic signals. A convolutional LSTM model was successfully constructed and trained by using audio data from five predefined fault types for both training and validation. To create the dataset, raw audio signal data was collected and processed in frames to capture time and frequency domain information. The model exhibited outstanding accuracy on training samples and demonstrated excellent generalization ability during validation, indicating its proficiency of generalization capability. On the test samples, the model achieved remarkable classification performance, with an overall accuracy exceeding 99.5%, and a false positive rate of less than 1% for normal status. The findings of this study provide essential support for the diagnosis and maintenance of bearing faults in wind turbine generators, with the potential to enhance the reliability and efficiency of wind power generation.
这项研究旨在开发一个用于风轮发电机轴承故障分类的深度学习模型,通过音频信号进行。使用来自五种预定义故障类型的音频数据成功构建和训练了卷积循环自注意力机制(LSTM)模型。为了创建数据集,对原始音频信号数据进行了收集和处理,以捕捉时间和频率域信息。在训练样本上,该模型表现出出色的准确度,而在验证样本上的表现也很出色,表明其具有良好的泛化能力。在测试样本上,该模型取得了显著的分类性能,整体准确率超过99.5%,正常状态下的假阳性率为不到1%,为轴承故障的诊断和维护提供了关键支持,有望提高风力发电的可靠性和效率。
https://arxiv.org/abs/2403.09030
Supervised deep learning techniques can be used to generate synthetic 7T MRIs from 3T MRI inputs. This image enhancement process leverages the advantages of ultra-high-field MRI to improve the signal-to-noise and contrast-to-noise ratios of 3T acquisitions. In this paper, we introduce multiple novel 7T synthesization algorithms based on custom-designed variants of the V-Net convolutional neural network. We demonstrate that the V-Net based model has superior performance in enhancing both single-site and multi-site MRI datasets compared to the existing benchmark model. When trained on 3T-7T MRI pairs from 8 subjects with mild Traumatic Brain Injury (TBI), our model achieves state-of-the-art 7T synthesization performance. Compared to previous works, synthetic 7T images generated from our pipeline also display superior enhancement of pathological tissue. Additionally, we implement and test a data augmentation scheme for training models that are robust to variations in the input distribution. This allows synthetic 7T models to accommodate intra-scanner and inter-scanner variability in multisite datasets. On a harmonized dataset consisting of 18 3T-7T MRI pairs from two institutions, including both healthy subjects and those with mild TBI, our model maintains its performance and can generalize to 3T MRI inputs with lower resolution. Our findings demonstrate the promise of V-Net based models for MRI enhancement and offer a preliminary probe into improving the generalizability of synthetic 7T models with data augmentation.
监督深度学习技术可用于从3T MRI输入生成合成7T MRI。这种图像增强过程利用了超高清MRI的优势来提高3T扫描的信号与噪声比和对比与噪声比。在本文中,我们基于自定义设计的V-Net卷积神经网络引入了多个新的7T合成算法。我们证明了基于V-Net的模型在增强单站点和多站点MRI数据集方面比现有基准模型具有卓越性能。当用8个受轻度创伤性脑损伤(TBI)的患者进行训练时,我们的模型在增强7T合成性能方面达到了最先进的水平。与之前的工作相比,我们通过我们的管道生成的合成7T图像还突出了病理组织增强。此外,我们还实现并测试了一个数据增强方案,用于训练对输入分布的变化具有鲁棒性的模型。这使得合成7T模型能够适应多站点数据集中的内部和间歇性变化。在一个由两个机构共18个3T-7T MRI对组成的和谐数据集中,我们的模型保持其性能,并可以降低分辨率后的3T MRI输入。我们的研究结果证明了V-Net基于模型的MRI增强前景,并为提高合成7T模型的泛化性提供了初步试探。
https://arxiv.org/abs/2403.08979
The aim of this paper is to evaluate the use of D-CNN (Deep Convolutional Neural Networks) algorithms to classify pig body conditions in normal or not normal conditions, with a focus on characteristics that are observed in sanitary monitoring, and were used six different algorithms to do this task. The study focused on five pig characteristics, being these caudophagy, ear hematoma, scratches on the body, redness, and natural stains (brown or black). The results of the study showed that D-CNN was effective in classifying deviations in pig body morphologies related to skin characteristics. The evaluation was conducted by analyzing the performance metrics Precision, Recall, and F-score, as well as the statistical analyses ANOVA and the Scott-Knott test. The contribution of this article is characterized by the proposal of using D-CNN networks for morphological classification in pigs, with a focus on characteristics identified in sanitary monitoring. Among the best results, the average Precision metric of 80.6\% to classify caudophagy was achieved for the InceptionResNetV2 network, indicating the potential use of this technology for the proposed task. Additionally, a new image database was created, containing various pig's distinct body characteristics, which can serve as data for future research.
本文旨在评估使用D-CNN(深度卷积神经网络)算法对猪身体状况进行分类的效果,重点关注在卫生监测中观察到的特征,并使用六种不同的算法进行了这项任务。研究重点关注五个猪的特征,这些特征是 caudophagy(猪只性病)、耳出血、身体上的抓痕、红色和自然污渍(棕色或黑色)。研究结果表明,D-CNN在分类猪身体形态与皮肤特征相关的偏差方面非常有效。评估通过分析精度、召回率和F-分数以及统计分析ANOVA和Scott-Knott测试来进行。本文的贡献在于提出了在猪身上使用D-CNN网络进行形态分类的可能性,重点关注在卫生监测中观察到的特征。在最佳结果中,InceptionResNetV2网络对caudophagy的平均精度为80.6%,表明该技术具有对所提议任务的潜在使用价值。此外,还创建了一个包含各种猪具独特身体特征的新图像数据库,可以作为未来研究的数据。
https://arxiv.org/abs/2403.08962
This article explores the latest Convolutional Neural Networks (CNNs) for cloud detection aboard hyperspectral satellites. The performance of the latest 1D CNN (1D-Justo-LiuNet) and two recent 2D CNNs (nnU-net and 2D-Justo-UNet-Simple) for cloud segmentation and classification is assessed. Evaluation criteria include precision and computational efficiency for in-orbit deployment. Experiments utilize NASA's EO-1 Hyperion data, with varying spectral channel numbers after Principal Component Analysis. Results indicate that 1D-Justo-LiuNet achieves the highest accuracy, outperforming 2D CNNs, while maintaining compactness with larger spectral channel sets, albeit with increased inference times. However, the performance of 1D CNN degrades with significant channel reduction. In this context, the 2D-Justo-UNet-Simple offers the best balance for in-orbit deployment, considering precision, memory, and time costs. While nnU-net is suitable for on-ground processing, deployment of lightweight 1D-Justo-LiuNet is recommended for high-precision applications. Alternatively, lightweight 2D-Justo-UNet-Simple is recommended for balanced costs between timing and precision in orbit.
本文探讨了在超分辨率卫星上检测云的最新卷积神经网络(CNN)。对最新1D CNN(1D-Justo-LiuNet)和两个最近的2D CNN(nnU-net和2D-Justo-UNet-Simple)的云分割和分类性能进行了评估。评估标准包括在轨部署的精度和计算效率。实验利用了NASA的EO-1 Hyperion数据,在主成分分析后,各种光谱通道数有所不同。结果显示,1D-Justo-LiuNet实现最高精度,超越了2D CNN,尽管在推断时间上增加了,仍然保持紧凑性。然而,随着通道数的显著减少,1D CNN的性能下降。在这种情况下,2D-Justo-UNet-Simple在轨道部署中提供了最佳的平衡,考虑精度、内存和时间成本。尽管nnU-net适用于地面处理,但对于高精度应用,建议使用轻量级的1D-Justo-LiuNet。另外,对于轨道中平衡时序和精度成本的应用,建议使用轻量级的2D-Justo-UNet-Simple。
https://arxiv.org/abs/2403.08695
Convolutional Neural Networks (CNNs) are nowadays the model of choice in Computer Vision, thanks to their ability to automatize the feature extraction process in visual tasks. However, the knowledge acquired during training is fully subsymbolic, and hence difficult to understand and explain to end users. In this paper, we propose a new technique called HOLMES (HOLonym-MEronym based Semantic inspection) that decomposes a label into a set of related concepts, and provides component-level explanations for an image classification model. Specifically, HOLMES leverages ontologies, web scraping and transfer learning to automatically construct meronym (parts)-based detectors for a given holonym (class). Then, it produces heatmaps at the meronym level and finally, by probing the holonym CNN with occluded images, it highlights the importance of each part on the classification output. Compared to state-of-the-art saliency methods, HOLMES takes a step further and provides information about both where and what the holonym CNN is looking at, without relying on densely annotated datasets and without forcing concepts to be associated to single computational units. Extensive experimental evaluation on different categories of objects (animals, tools and vehicles) shows the feasibility of our approach. On average, HOLMES explanations include at least two meronyms, and the ablation of a single meronym roughly halves the holonym model confidence. The resulting heatmaps were quantitatively evaluated using the deletion/insertion/preservation curves. All metrics were comparable to those achieved by GradCAM, while offering the advantage of further decomposing the heatmap in human-understandable concepts, thus highlighting both the relevance of meronyms to object classification, as well as HOLMES ability to capture it. The code is available at this https URL.
卷积神经网络(CNNs)如今已经成为计算机视觉领域的首选模型,因为它们能够自动在视觉任务中自动化特征提取过程。然而,在训练过程中获得的全部知识都是符号化的,因此对于最终用户来说很难理解和解释。在本文中,我们提出了一种新的技术称为HOLMES(基于HOLonym的语义检查),它将标签分解为一组相关概念,并为图像分类模型提供组件级别的解释。具体来说,HOLMES利用本体论、网络爬取和迁移学习自动构建了针对给定holonym(类别)的聚类(部分)检测器。然后,它在聚类层级产生热力图,最后,通过向聚类CNN添加遮罩图像,它突出了每个部分在分类输出中的重要性。与最先进的显著性方法相比,HOLMES更进一步,提供了关于holonym CNN在做什么以及看什么的信息,而没有依赖密集标注的数据集,也没有强制将概念与单计算单元相关联。在不同的对象类别(动物、工具和车辆)上的广泛实验评估表明,我们的方法是可行的。平均情况下,HOLMES的解释包括至少两个聚类,而单个聚类的消融大约一半了holonym模型的信心。通过删除/插入/保留曲线对热力图进行定量评估。所有指标都与GradCAM获得的指标相当,但提供了对人类可理解概念的进一步分解,从而突出了聚类对于物体分类的相关性以及HOLMES捕获它的能力。代码可在此处下载:https://www.acm.org/dl/doi/10.1145/28886.28887
https://arxiv.org/abs/2403.08536