Predicting Remaining Useful Life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time series sensory data from such systems, deep learning models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies of individual sensors, spatial dependencies emerge as important correlations among these sensors, which can be naturally modelled by a temporal graph that describes time-varying spatial relationships. However, the majority of existing studies have relied on capturing discrete snapshots of this temporal graph, a coarse-grained approach that leads to loss of temporal information. Moreover, given the variety of heterogeneous sensors, it becomes vital that such inherent heterogeneity is leveraged for RUL prediction in temporal sensor graphs. To capture the nuances of the temporal and spatial relationships and heterogeneous characteristics in an interconnected graph of sensors, we introduce a novel model named Temporal and Heterogeneous Graph Neural Networks (THGNN). Specifically, THGNN aggregates historical data from neighboring nodes to accurately capture the temporal dynamics and spatial correlations within the stream of sensor data in a fine-grained manner. Moreover, the model leverages Feature-wise Linear Modulation (FiLM) to address the diversity of sensor types, significantly improving the model's capacity to learn the heterogeneity in the data sources. Finally, we have validated the effectiveness of our approach through comprehensive experiments. Our empirical findings demonstrate significant advancements on the N-CMAPSS dataset, achieving improvements of up to 19.2% and 31.6% in terms of two different evaluation metrics over state-of-the-art methods.
预测剩余使用寿命(RUL)在涉及多种相关传感器的工业系统的预诊和健康管理中扮演着关键角色。给定不断从这些系统中涌现的时间序列传感器数据流,深度学习模型已经崛起,用于识别这些数据中的复杂非线性时间依赖关系。除了单个传感器的時間依賴關係之外,還出现了这些传感器之间的空间依賴關係,這些依賴關係可以自然地用描述时间变化的图来建模。然而,现有的研究主要依赖于捕捉这个时间图的离散快照,这是一种粗粒度的方法,导致丢失了时间信息。此外,考虑到传感器的不均匀性,在传感器图中利用固有异质性对于RUL预测至关重要。为了捕捉传感器之间的时间和非线性关系以及异质特征,我们引入了一种名为 Temporal and Heterogeneous Graph Neural Networks (THGNN) 的新模型。具体来说,THGNN通过聚合邻近节点的 historical 数据,准确地捕捉传感器数据流中的时间动态和空间关联。此外,模型利用特征级线性变换(FiLM)来解决传感器类型的异质性,显著提高了模型学习数据源异质性的能力。最后,我们通过全面的实验验证了我们的方法的有效性。我们的实证研究结果表明,在 N-CMAPSS 数据集上取得了显著的进展,将先进的预估方法的性能提高了 19.2% 和 31.6%。
https://arxiv.org/abs/2405.04336
Since late 2022, generative AI has taken the world by storm, with widespread use of tools including ChatGPT, Gemini, and Claude. Generative AI and large language model (LLM) applications are transforming how individuals find and access data and knowledge. However, the intricate relationship between open data and generative AI, and the vast potential it holds for driving innovation in this field remain underexplored areas. This white paper seeks to unpack the relationship between open data and generative AI and explore possible components of a new Fourth Wave of Open Data: Is open data becoming AI ready? Is open data moving towards a data commons approach? Is generative AI making open data more conversational? Will generative AI improve open data quality and provenance? Towards this end, we provide a new Spectrum of Scenarios framework. This framework outlines a range of scenarios in which open data and generative AI could intersect and what is required from a data quality and provenance perspective to make open data ready for those specific scenarios. These scenarios include: pertaining, adaptation, inference and insight generation, data augmentation, and open-ended exploration. Through this process, we found that in order for data holders to embrace generative AI to improve open data access and develop greater insights from open data, they first must make progress around five key areas: enhance transparency and documentation, uphold quality and integrity, promote interoperability and standards, improve accessibility and useability, and address ethical considerations.
自2022年底以来,生成式人工智能(generative AI)彻底颠覆了世界,各种工具(包括ChatGPT、Gemini和Claude)的广泛应用使人们能够以全新的方式找到和访问数据和知识。生成式人工智能和大语言模型(LLM)应用正在改变个人如何发现和获取数据和知识的方式。然而,开放数据和生成式人工智能之间的关系以及它在推动这一领域创新方面所具有的广泛潜力仍然是未探索的领域。这份白皮书旨在解开开放数据和生成式人工智能之间的关系,并探讨可能的第四波开放数据的新组件:开放数据是否成为人工智能(AI)准备就绪?开放数据是否正朝着数据共享方法论演变?生成式人工智能是否使开放数据更具交互性?生成式人工智能是否改善了开放数据的质量和来源?为此,我们提供了一个新的场景框架。这个框架概述了开放数据和生成式人工智能在不同场景下可能产生的交集,以及从数据质量和来源角度看,开放数据在这些场景下做好准备所需的必要条件。这些场景包括:相关性、适应性、推理和洞察生成、数据增强和开放性探索。通过这个过程,我们发现,为了让数据持有者利用生成式人工智能改进开放数据访问并从开放数据中获得更大洞察,他们首先必须围绕五个关键领域取得进展:提高透明度和文档记录、维护质量和完整性、促进互操作性和标准、提高可访问性和可用性,以及解决道德问题。
https://arxiv.org/abs/2405.04333
Evolution Strategies (ES) are effective gradient-free optimization methods that can be competitive with gradient-based approaches for policy search. ES only rely on the total episodic scores of solutions in their population, from which they estimate fitness gradients for their update with no access to true gradient information. However this makes them sensitive to deceptive fitness landscapes, and they tend to only explore one way to solve a problem. Quality-Diversity methods such as MAP-Elites introduced additional information with behavior descriptors (BD) to return a population of diverse solutions, which helps exploration but leads to a large part of the evaluation budget not being focused on finding the best performing solution. Here we show that behavior information can also be leveraged to find the best policy by identifying promising search areas which can then be efficiently explored with ES. We introduce the framework of Quality with Just Enough Diversity (JEDi) which learns the relationship between behavior and fitness to focus evaluations on solutions that matter. When trying to reach higher fitness values, JEDi outperforms both QD and ES methods on hard exploration tasks like mazes and on complex control problems with large policies.
进化策略(ES)是一种有效的无需梯度的优化方法,在策略搜索中可以与基于梯度的方法竞争。ES仅依赖于其种群中解决方案的全面状态,然后根据这些状态估计更新时的适应度梯度,而无需访问真实的梯度信息。然而,这使得它们对欺骗性的 fitness 景观敏感,并且它们倾向于只探索一个问题。诸如 MAP-Elites 这样的质量多样性方法通过引入行为描述符(BD)为种群返回了一个多样性的解决方案,这有助于探索,但导致评估预算的大部分没有集中在找到最佳策略上。我们在这里展示,行为信息也可以用于通过确定有前景的搜索区域来找到最佳策略。我们引入了 Quality with Just Enough Diversity (JEDi) 框架,该框架学会了行为和适应度之间的关系,将评估重点放在解决方案上。在尝试达到更高的 fitness 值时,JEDi 超越了 QD 和 ES 方法在具有挑战性的探索任务(如迷宫)和复杂控制问题(具有大量策略)上的表现。
https://arxiv.org/abs/2405.04308
When deploying deep neural networks on robots or other physical systems, the learned model should reliably quantify predictive uncertainty. A reliable uncertainty allows downstream modules to reason about the safety of its actions. In this work, we address metrics for evaluating such an uncertainty. Specifically, we focus on regression tasks, and investigate Area Under Sparsification Error (AUSE), Calibration Error, Spearman's Rank Correlation, and Negative Log-Likelihood (NLL). Using synthetic regression datasets, we look into how those metrics behave under four typical types of uncertainty, their stability regarding the size of the test set, and reveal their strengths and weaknesses. Our results indicate that Calibration Error is the most stable and interpretable metric, but AUSE and NLL also have their respective use cases. We discourage the usage of Spearman's Rank Correlation for evaluating uncertainties and recommend replacing it with AUSE.
在将深度神经网络应用于机器人或其他物理系统时,学习到的模型应可靠地量化预测的不确定性。可靠的不确定性允许下游模块评估其行动的安全性。在这项工作中,我们关注评估这种不确定性的指标。具体来说,我们关注回归任务,并研究了稀疏化误差(AUSE)、标定误差、斯皮尔曼相关系数和负对数似然(NLL)。使用合成回归数据集,我们研究了这四个典型类型不确定性下,这些指标的行为,以及它们关于测试集大小的稳定性,并揭示了它们的优缺点。我们的结果表明,标定误差是最稳定且最易解释的指标,但AUSE和NLL也有各自的适用场景。我们劝诫使用斯皮尔曼等级相关系数来评估不确定性,并建议用AUSE来代替它。
https://arxiv.org/abs/2405.04278
Considering learner engagement has a mutual benefit for both learners and instructors. Instructors can help learners increase their attention, involvement, motivation, and interest. On the other hand, instructors can improve their instructional performance by evaluating the cumulative results of all learners and upgrading their training programs. This paper proposes a general, lightweight model for selecting and processing features to detect learners' engagement levels while preserving the sequential temporal relationship over time. During training and testing, we analyzed the videos from the publicly available DAiSEE dataset to capture the dynamic essence of learner engagement. We have also proposed an adaptation policy to find new labels that utilize the affective states of this dataset related to education, thereby improving the models' judgment. The suggested model achieves an accuracy of 68.57\% in a specific implementation and outperforms the studied state-of-the-art models detecting learners' engagement levels.
考虑到学习者的参与对于学习者和教师都有 mutual利益。教师可以帮助学习者增加他们的注意力和参与度,提高他们的动机和兴趣。另一方面,教师可以通过评估所有学习者的累积结果来提高他们的培训计划。本文提出了一种通用的轻量级模型,用于选择和处理特征来检测学习者的参与度,同时保留有时间顺序关系。在训练和测试期间,我们分析了公开可用DAiSEE数据集中的视频,以捕捉学习者参与度的动态本质。我们还提出了一个适应策略,以查找与该数据集相关的情感状态,从而提高模型的判断。所提出的模型在特定实现中的准确率为68.57\%,并优于已研究的最先进的模型,这些模型用于检测学习者的参与度。
https://arxiv.org/abs/2405.04251
Graph self-supervised learning has sparked a research surge in training informative representations without accessing any labeled data. However, our understanding of graph self-supervised learning remains limited, and the inherent relationships between various self-supervised tasks are still unexplored. Our paper aims to provide a fresh understanding of graph self-supervised learning based on task correlations. Specifically, we evaluate the performance of the representations trained by one specific task on other tasks and define correlation values to quantify task correlations. Through this process, we unveil the task correlations between various self-supervised tasks and can measure their expressive capabilities, which are closely related to downstream performance. By analyzing the correlation values between tasks across various datasets, we reveal the complexity of task correlations and the limitations of existing multi-task learning methods. To obtain more capable representations, we propose Graph Task Correlation Modeling (GraphTCM) to illustrate the task correlations and utilize it to enhance graph self-supervised training. The experimental results indicate that our method significantly outperforms existing methods across various downstream tasks.
图形自监督学习引起了在无需访问任何标注数据的情况下进行训练的有用表示的研究爆发。然而,我们对图形自监督学习的理解仍然有限,并且各种自监督任务之间的内在关系仍然没有被探索。本文旨在基于任务相关性提供对图形自监督学习的全新理解。具体来说,我们评估特定任务对其他任务的表示性能,并定义关联值来量化任务相关性。通过这个过程,我们揭示了各种自监督任务之间的任务相关性,并可以衡量它们的表达能力,这与下游性能密切相关。通过分析各种数据集之间任务之间的关联值,我们揭示了任务相关性的复杂性和现有多任务学习方法的局限性。为了获得更强大的表示,我们提出了Graph Task Correlation Modeling (GraphTCM)来说明任务相关性,并利用它来增强图形自监督训练。实验结果表明,我们的方法在各种下游任务上显著优于现有方法。
https://arxiv.org/abs/2405.04245
Deep neural networks give us a powerful method to model the training dataset's relationship between input and output. We can regard that as a complex adaptive system consisting of many artificial neurons that work as an adaptive memory as a whole. The network's behavior is training dynamics with a feedback loop from the evaluation of the loss function. We already know the training response can be constant or shows power law-like aging in some ideal situations. However, we still have gaps between those findings and other complex phenomena, like network fragility. To fill the gap, we introduce a very simple network and analyze it. We show the training response consists of some different factors based on training stages, activation functions, or training methods. In addition, we show feature space reduction as an effect of stochastic training dynamics, which can result in network fragility. Finally, we discuss some complex phenomena of deep networks.
深度神经网络给我们了一种强大的方法来建模训练数据输入和输出之间的关系。我们可以将这看作是一个由许多人工神经元组成的复杂适应系统,作为一个整体,这些神经元表现出一种自适应记忆的特性。网络的行为是训练动态,通过损失函数的评估反馈循环。我们已知培训响应可以是常数,或者在某些理想情况下表现出类似于功率定律的老化。然而,我们仍然存在在那些发现和其它复杂现象之间的一些空白,比如网络的脆弱性。为了填补这个空白,我们引入了一个非常简单的网络,并对其进行分析。我们展示了培训响应取决于训练阶段、激活函数或训练方法的不同因素。此外,我们还展示了随机训练动态对特征空间缩减的影响,这可能导致网络脆弱性。最后,我们讨论了一些关于深度网络的复杂现象。
https://arxiv.org/abs/2405.04074
This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.
这项研究探讨了OpenAI的GPT模型作为独立评估者评估由Hugging Face开发的六个基于Transformer的模型的文本摘要的有效性:DistilBART,BERT,ProphetNet,T5,BART和PEGASUS。我们根据高质量摘要的基本属性——简洁性,相关性,连贯性和可读性——评估这些摘要。此外,我们还使用GPT作为评估者,使其能够独立评估摘要的质量,而无需预先定义的指标。我们的分析揭示了GPT评估与传统指标之间的显著相关性,特别是在评估相关性和连贯性方面。结果表明,GPT具有作为评价文本摘要的稳健工具的潜力,提供了与现有指标不同的见解,并为自然语言处理任务中基于Transformer模型的比较分析提供了基础。
https://arxiv.org/abs/2405.04053
Feature compression, as an important branch of video coding for machines (VCM), has attracted significant attention and exploration. However, the existing methods mainly focus on intra-feature similarity, such as the Mean Squared Error (MSE) between the reconstructed and original features, while neglecting the importance of inter-feature relationships. In this paper, we analyze the inter-feature relationships, focusing on feature discriminability in machine vision and underscoring its significance in feature compression. To maintain the feature discriminability of reconstructed features, we introduce a discrimination metric for feature compression. The discrimination metric is designed to ensure that the distance between features of the same category is smaller than the distance between features of different categories. Furthermore, we explore the relationship between the discrimination metric and the discriminability of the original features. Experimental results confirm the effectiveness of the proposed discrimination metric and reveal there exists a trade-off between the discrimination metric and the discriminability of the original features.
特征压缩作为机器视频编码(VCM)的一个重要分支,已经引起了广泛关注和探索。然而,现有的方法主要关注内特征相似性,如重构和原始特征之间的均方误差(MSE),而忽略了内特征之间的重要性关系。在本文中,我们分析了内特征之间的关系,重点关注机器视觉中的特征可识别性,并强调其在特征压缩中的重要性。为了保持重构特征的可识别性,我们引入了一个用于特征压缩的区分度度量。该区分度度量旨在确保同一类别特征之间的距离小于不同类别特征之间的距离。此外,我们还探讨了区分度度量与原始特征的可识别性之间的关系。实验结果证实了所提出的区分度度量的有效性,并揭示了区分度度量与原始特征可识别性之间存在一种权衡关系。
https://arxiv.org/abs/2405.04044
Autonomous driving perception models are typically composed of multiple functional modules that interact through complex relationships to accomplish environment understanding. However, perception models are predominantly optimized as a black box through end-to-end training, lacking independent evaluation of functional modules, which poses difficulties for interpretability and optimization. Pioneering in the issue, we propose an evaluation method based on feature map analysis to gauge the convergence of model, thereby assessing functional modules' training maturity. We construct a quantitative metric named as the Feature Map Convergence Score (FMCS) and develop Feature Map Convergence Evaluation Network (FMCE-Net) to measure and predict the convergence degree of models respectively. FMCE-Net achieves remarkable predictive accuracy for FMCS across multiple image classification experiments, validating the efficacy and robustness of the introduced approach. To the best of our knowledge, this is the first independent evaluation method for functional modules, offering a new paradigm for the training assessment towards perception models.
自动驾驶感知模型通常由多个功能模块组成,通过复杂的关系来理解环境。然而,通过端到端的训练优化感知模型,缺乏对功能模块的独立评估,这使得模型的可解释性和优化存在困难。在这个问题上,我们提出了基于特征图分析的评估方法来衡量模型的收敛,从而评估功能模块的训练成熟度。我们构建了一个名为特征图收敛分数(FMCS)的定量度量,并开发了特征图收敛评估网络(FMCE-Net)来分别测量和预测模型的收敛程度。在多个图像分类实验中,FMCE-Net在FMCS上的预测准确性非常显著,验证了所引入方法的有效性和鲁棒性。据我们所知,这是第一个关于功能模块的独立评估方法,为感知模型的训练评估提供了一个新的范式。
https://arxiv.org/abs/2405.04041
Sora promises to redefine the way visual content is created. Despite its numerous forecasted benefits, the drivers of user willingness to use the text-to-video (T2V) model are unknown. This study extends the extended unified theory of acceptance and use of technology (UTAUT2) with perceived realism and novelty value. Using a purposive sampling method, we collected data from 940 respondents in the US and analyzed the sample using covariance-based structural equation modeling and fuzzy set qualitative comparative analysis (fsQCA). The findings reveal that all hypothesized relationships are supported, with perceived realism emerging as the most influential driver, followed by novelty value. Moreover, fsQCA identifies five configurations leading to high and low willingness to use, and the model demonstrates high predictive validity, contributing to theory advancement. Our study provides valuable insights for developers and marketers, offering guidance for strategic decisions to promote the widespread adoption of T2V models.
Sora誓言重新定义视觉内容创作的方式。尽管预测带来了很多好处,但用户使用文本转视频(T2V)模型的意愿驱动因素仍然是未知的。本研究在采用有向抽样方法收集了940名美国受访者的数据后,利用相关结构方程模型和模糊集合定性比较分析(fsQCA)对样本进行分析。研究结果表明,所有预测关系均得到支持,感知现实主义成为最具有影响力的驱动因素,其次是新颖性价值。此外,fsQCA识别出导致高和低意愿使用的五个配置,模型具有高度预测能力,有助于理论发展。我们的研究为开发人员和市场人员提供了宝贵的洞见,为推广T2V模型的广泛采用提供了指导。
https://arxiv.org/abs/2405.03986
The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at this https URL .
机器学习(ML)和人工智能(AI)中工作的迭代性质以及依赖基准数据集的特点强调了在文献中可重复性的重要性。然而,资源限制和不足的文档可以使运行复制尤其具有挑战性。我们的工作探讨了使用下游引用上下文作为可重复性信号的可能性。我们引入了一个应用于参与机器学习可重复性挑战的论文的引用上下文的情绪分析框架,以解释繁殖尝试的积极或消极结果。我们的贡献包括为可重复性相关上下文训练分类器和探索引用上下文情感与可重复性评分之间的相关性。研究数据、软件和一件附件都可以在https://url.com/这个URL公开使用。
https://arxiv.org/abs/2405.03977
The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To achieve this, we design a cross-modality registration method, spanning from front end the back end. At the front end, we extract descriptors by identifying walls and capturing the intersected corners. Subsequently, for the back-end pose estimation, we employ the Hough transform for pose estimation and estimate multiple pose candidates. The final pose is verified by wall-pixel correlation. To evaluate the effectiveness of our method, we conducted real-world multi-session experiments in a large-scale university building, involving two different types of LiDAR sensors. We also report our findings and plan to make our collected dataset open-sourced.
建筑和机器人感测数据来自不同的来源,并与独特的视角相关。本研究的主要目标是用全局点云配准方法将激光雷达点云与建筑信息模型(BIM)对齐,旨在建立两种数据之间的共同理解,即“使用相同的语言交流”。为实现这一目标,我们设计了一种跨模态配准方法,从前端到后端。在前端,我们通过识别墙并捕获相交角来提取描述符。接下来,为了后端姿态估计,我们使用Hough变换来进行姿态估计,估计多个姿态候选者。最后,通过墙像素相关性验证最终姿态。为了评估我们方法的有效性,我们在一个大型的大学楼中进行了多次现实世界的多会话实验,涉及两种不同类型的激光雷达传感器。我们还报告了我们的发现,并计划将我们所收集的数据公开开源。
https://arxiv.org/abs/2405.03969
Currently, portable electronic devices are becoming more and more popular. For lightweight considerations, their fingerprint recognition modules usually use limited-size sensors. However, partial fingerprints have few matchable features, especially when there are differences in finger pressing posture or image quality, which makes partial fingerprint verification challenging. Most existing methods regard fingerprint position rectification and identity verification as independent tasks, ignoring the coupling relationship between them -- relative pose estimation typically relies on paired features as anchors, and authentication accuracy tends to improve with more precise pose alignment. Consequently, in this paper we propose a method that jointly estimates identity verification and relative pose for partial fingerprints, aiming to leverage their inherent correlation to improve each other. To achieve this, we propose a multi-task CNN (Convolutional Neural Network)-Transformer hybrid network, and design a pre-training task to enhance the feature extraction capability. Experiments on multiple public datasets (NIST SD14, FVC2002 DB1A & DB3A, FVC2004 DB1A & DB2A, FVC2006 DB1A) and an in-house dataset show that our method achieves state-of-the-art performance in both partial fingerprint verification and relative pose estimation, while being more efficient than previous methods.
目前,便携式电子设备越来越受欢迎。考虑到轻量化的因素,它们的指纹识别模块通常使用有限尺寸的传感器。然而,部分指纹具有有限的可匹配特征,尤其是在手指按压姿势或图像质量存在差异时,这使得部分指纹验证具有挑战性。大多数现有方法将指纹位置校正和身份验证视为独立的任务,忽略了它们之间的耦合关系——相对姿态估计通常依赖于成对特征作为锚点,而身份验证准确性往往随着更精确的姿势对齐而提高。因此,在本文中,我们提出了一个方法,该方法共同估计部分指纹的身份验证和相对姿态,旨在利用它们固有的相关性提高彼此。为达到这一目标,我们提出了一个多任务 CNN-Transformer 混合网络,并设计了一个预训练任务来增强特征提取能力。在多个公开数据集(NIST SD14,FVC2002 DB1A & DB3A,FVC2004 DB1A & DB2A,FVC2006 DB1A & DB2A)和内部数据集的实验结果表明,我们的方法在部分指纹验证和相对姿态估计方面实现了最先进的性能,而效率比以前的方法更高。
https://arxiv.org/abs/2405.03959
Graph Neural Networks (GNNs) have excelled in learning from graph-structured data, especially in understanding the relationships within a single graph, i.e., intra-graph relationships. Despite their successes, GNNs are limited by neglecting the context of relationships across graphs, i.e., inter-graph relationships. Recognizing the potential to extend this capability, we introduce Relating-Up, a plug-and-play module that enhances GNNs by exploiting inter-graph relationships. This module incorporates a relation-aware encoder and a feedback training strategy. The former enables GNNs to capture relationships across graphs, enriching relation-aware graph representation through collective context. The latter utilizes a feedback loop mechanism for the recursively refinement of these representations, leveraging insights from refining inter-graph dynamics to conduct feedback loop. The synergy between these two innovations results in a robust and versatile module. Relating-Up enhances the expressiveness of GNNs, enabling them to encapsulate a wider spectrum of graph relationships with greater precision. Our evaluations across 16 benchmark datasets demonstrate that integrating Relating-Up into GNN architectures substantially improves performance, positioning Relating-Up as a formidable choice for a broad spectrum of graph representation learning tasks.
图神经网络(GNNs)在处理图结构数据方面表现出色,尤其是在理解单个图中节点之间的关系,即 intra-graph 关系。尽管它们取得了成功,但GNNs 的局限在于忽略了图中关系之间的上下文,即 inter-graph 关系。为了扩展这种能力,我们引入了关系增强模块(Relating-Up),这是一种可插拔的模块,通过利用 inter-graph 关系增强了GNNs。该模块包括关系感知编码器和一个反馈训练策略。前一个策略使GNNs能够捕捉图形之间的关系,通过集体上下文丰富关系感知的图表示。后一个策略利用反馈循环机制对这些表示进行递归优化,并利用改进 inter-graph 动态的见解进行反馈循环。这两个创新之间的协同作用导致了一个稳健且多功能的模块。关系增强使GNNs 的表达力更加出色,使它们能够更精确地封装更广泛的图形关系。我们在16个基准数据集上的评估表明,将关系增强模块集成到GNN架构中会极大地提高性能,将关系增强定位为各种图形表示学习任务的出色选择。
https://arxiv.org/abs/2405.03950
Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient's EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient's health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.
基于深度学习的预测模型,利用电子病历(EHR)越来越受到医疗领域的关注。一个有效的患者EHR的表示应该分层涵盖历史就诊之间的时间关系和医疗事件内部的固有结构信息。现有的患者表示方法可以大致分为序列表示和图形表示。序列表示方法仅关注纵向就诊之间的关系。另一方面,图形表示方法虽然善于提取各种医疗事件之间的图结构关系,但在有效整合时间信息方面存在不足。为了捕捉这两种信息,我们将患者的EHR建模为一种新颖的时间异质图。这个图包括历史就诊节点和医疗事件节点。它从医疗事件节点传播结构信息到就诊节点,并使用时间感知就诊节点来捕捉患者健康状况的改变。此外,我们引入了一种新颖的时间图变换器(TRANS),它将时间边特征、全局位置编码和局部结构编码集成到异质图卷积中,捕捉时间和结构信息。我们对TRANS的有效性进行了广泛实验,涉及三个真实世界数据集。结果表明,我们提出的方法达到了最先进的水平。
https://arxiv.org/abs/2405.03943
Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance when the policy is evaluated on scenarios that are Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL resolves this issue by regularizing policy learning within the information supported by the given dataset. However, such regularization overlooks the potential for high-reward regions that may exist beyond the dataset. This motivates exploring novel offline learning techniques that can make improvements beyond the data support without compromising policy performance, potentially by learning causation (cause-and-effect) instead of correlation from the dataset. In this paper, we propose the MOOD-CRL (Model-based Offline OOD-Adapting Causal RL) algorithm, which aims to address the challenge of extrapolation for offline policy training through causal inference instead of policy-regularizing methods. Specifically, Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. Based on the data-invariant, physics-based qualitative causal graph and the observational data, we develop a novel learning scheme for CNF to learn the quantitative structural causal model. As a result, CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
尽管强化学习(RL)取得了显著的成功,但在线学习范式普遍使用导致其广泛应用受限,尤其是在危险或昂贵的情景中。离线强化学习(Offline RL)作为一种替代方案应运而生,通过预先收集的静态数据集进行学习。然而,这种离线学习引入了一个名为分布平滑的新挑战,当策略在训练数据集之外的场景上评估时,会降低其性能。为解决此问题,大多数现有的离线强化学习方法通过在给定数据集支持的范围内对策略进行规范化来解决。然而,这种规范化方法忽视了数据之外可能存在高奖励区域的事实。因此,探索新的离线学习方法具有提高数据支持下的策略性能而不会牺牲策略性能潜力,通过从数据中学习因果关系(原因和结果)来解决此问题。在本文中,我们提出了MOOD-CRL(基于模型的离线OUD适应因果RL)算法,旨在通过因果推理而不是策略规范化方法来解决离线策略训练的扩展问题。具体来说,我们开发了因果正常化流(CNF)来学习数据生成和增强在离线策略评估和训练中的转移和奖励函数。基于数据无关、基于物理的定性因果图和观测数据,我们为CNF开发了一种新的学习方案,以学习量化结构因果模型。因此,CNF在序列决策任务中获得了预测和反事实推理能力,揭示了其在大数据迁移方面的巨大潜力。我们的基于CNF的离线强化学习方法通过实证评估证明了比模型免费和基于模型的方法具有显著的优越性。
https://arxiv.org/abs/2405.03892
This article explores how emerging generative artificial intelligence (GenAI) models, such as large language models (LLMs), can enhance solution methodologies within process systems engineering (PSE). These cutting-edge GenAI models, particularly foundation models (FMs), which are pre-trained on extensive, general-purpose datasets, offer versatile adaptability for a broad range of tasks, including responding to queries, image generation, and complex decision-making. Given the close relationship between advancements in PSE and developments in computing and systems technologies, exploring the synergy between GenAI and PSE is essential. We begin our discussion with a compact overview of both classic and emerging GenAI models, including FMs, and then dive into their applications within key PSE domains: synthesis and design, optimization and integration, and process monitoring and control. In each domain, we explore how GenAI models could potentially advance PSE methodologies, providing insights and prospects for each area. Furthermore, the article identifies and discusses potential challenges in fully leveraging GenAI within PSE, including multiscale modeling, data requirements, evaluation metrics and benchmarks, and trust and safety, thereby deepening the discourse on effective GenAI integration into systems analysis, design, optimization, operations, monitoring, and control. This paper provides a guide for future research focused on the applications of emerging GenAI in PSE.
本文探讨了新兴的生成人工智能(GenAI)模型,如大型语言模型(LLMs),如何增强过程系统工程(PSE)中的解决方案方法论。这些尖端的GenAI模型,特别是基础模型(FMs),预先训练于广泛的数据集,具有广泛的适应性,包括响应查询、图像生成和复杂决策。鉴于PSE的发展与计算和系统技术进步的密切相关性,探索GenAI与PSE之间的协同作用至关重要。我们从一个简要概述经典和新兴GenAI模型开始,包括FMs,然后深入探讨它们在PSE关键领域中的应用:合成与设计、优化与集成以及过程监控与控制。在每个领域,我们探讨了GenAI模型如何潜在地提高PSE方法论,并为每个领域提供见解和前景。此外,本文还识别并讨论了在充分利用GenAI within PSE过程中可能面临的一些挑战,包括多尺度建模、数据要求、评估指标和基准以及信任和安全,从而加深了对有效GenAI集成到系统分析、设计、优化、操作和监控的讨论。本文为未来研究关于新兴GenAI在PSE应用提供了指南。
https://arxiv.org/abs/2402.10977
Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and surrounding vehicles by making use of the road topology. We also introduce an edge-enhanced heterogeneous graph transformer (EHGT) as the aggregator in a graph neural network (GNN) to encode the semantic and spatial agent interaction information. Additionally, we introduce a temporal encoder based on gated recurrent units (GRU) to model the temporal social behavior of agent movements. Finally, we present an information fusion framework that integrates agent encoding, lane encoding, and agent interaction encoding for a holistic representation of the traffic scene. We evaluate SocialFormer for the trajectory prediction task on the popular nuScenes benchmark and achieve state-of-the-art performance.
准确的轨迹预测对于确保安全和高效的自动驾驶至关重要。然而,大多数现有方法忽视了交通参与者之间的复杂交互,这些交互通常决定了他们的未来轨迹。在本文中,我们提出了SocialFormer,一种关注代理与周围车辆之间语义关系的轨迹预测方法,通过利用道路拓扑结构来利用目标车辆与周围车辆之间的语义关系。我们还引入了一个增强的异质图变换器(EHGT)作为图神经网络(GNN)的聚合器,以编码代理的语义和空间交互信息。此外,我们还引入了一个基于门控循环单元(GRU)的时间编码器来建模代理运动的时间社交行为。最后,我们提出了一个整合代理编码、路况编码和代理交互编码的信息融合框架,以对交通场景进行全面的表示。我们在流行的nuScenes基准上评估SocialFormer的轨迹预测任务,并取得了最先进的性能。
https://arxiv.org/abs/2405.03809
In this paper, we investigate a novel artificial intelligence generation task, termed as generated contents enrichment (GCE). Different from conventional artificial intelligence contents generation task that enriches the given textual description implicitly with limited semantics for generating visually real content, our proposed GCE strives to perform content enrichment explicitly on both the visual and textual domain, from which the enriched contents are visually real, structurally reasonable, and semantically abundant. Towards to solve GCE, we propose a deep end-to-end method that explicitly explores the semantics and inter-semantic relationships during the enrichment. Specifically, we first model the input description as a semantic graph, wherein each node represents an object and each edge corresponds to the inter-object relationship. We then adopt Graph Convolutional Networks on top of the input scene description to predict the enriching objects and their relationships with the input objects. Finally, the enriched graph is fed into an image synthesis model to carry out the visual contents generation. Our experiments conducted on the Visual Genome dataset exhibit promising and visually plausible results.
在本文中,我们研究了一种名为生成内容丰富(GCE)的新人工智能生成任务。与传统的人工智能内容生成任务不同,该任务通过在给定的文本描述中隐含有限语义来丰富文本内容,以生成视觉上真实的内容,但这种丰富内容在语义上不充分。为了解决GCE,我们提出了一个端到端的方法,在丰富过程中明确探索语义和跨语义关系。具体来说,我们首先将输入描述建模为语义图,其中每个节点表示一个对象,每条边表示对象之间的相互作用。然后我们在输入场景描述上应用图卷积网络来预测要生成的丰富对象及其与输入对象的关系。最后,生成的丰富图被输入到图像合成模型中进行视觉内容生成。我们在视觉基因组数据集上进行实验,结果表明具有鼓舞人心的视觉和视觉合理的结果。
https://arxiv.org/abs/2405.03650