Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples. While these improve performance, they often reach a saturation point. Refinement offers an alternative by using LLM-generated feedback to improve solution quality. However, refinement introduces 3 key challenges: (1) Excessive refinement: Uniformly refining all instances can over-correct and reduce the overall performance. (2) Inability to localize and address errors: LLMs have a limited ability to self-correct and struggle to identify and correct their own mistakes. (3) Insufficient refinement: Deciding how many iterations of refinement are needed is non-trivial, and stopping too soon could leave errors unaddressed. To tackle these issues, we propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard, solving easy problems with coarse-grained aggregation and hard ones with fine-grained and iterative multi-agent refinement. To improve error localization, we incorporate external step-wise reward model (RM) scores. Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback). To ensure sufficient refinement, we re-evaluate updated solutions, iteratively initiating further rounds of refinement. We evaluate MAgICoRe on Llama-3-8B and GPT-3.5 and show its effectiveness across 5 math datasets. Even one iteration of MAgICoRe beats Self-Consistency by 3.4%, Best-of-k by 3.2%, and Self-Refine by 4.0% while using less than half the samples. Unlike iterative refinement with baselines, MAgICoRe continues to improve with more iterations. Finally, our ablations highlight the importance of MAgICoRe's RMs and multi-agent communication.
大语言模型的推理可以通过在测试时间内进行聚合策略来提高,即生成多个样本并从中投票。尽管这些方法提高了性能,但它们通常达到饱和点。通过使用LLM生成的反馈来提高解决方案的质量,提供了另一种方法。然而,这种方法引入了三个关键挑战:(1)过度细化:均匀地细化所有实例可能会过度纠正并降低整体性能。(2)无法定位和解决错误:LLM的自我纠正能力有限,很难识别和纠正自己的错误。(3)不够细化:决定需要多少轮细化是一个非 trivial 的问题, stopping too soon could leave errors unaddressed。为了解决这些问题,我们提出了MAgICoRe,它通过将问题难度分类为容易或困难来避免过度细化,用粗粒度聚合解决容易问题,用细粒度多代理器迭代解决困难问题。为了改善错误定位,我们引入了外部逐步奖励模型(RM)得分。此外,为了确保有效的迭代,我们使用了一个多代理器循环,包括求解器、评论者(根据逐步 RM 得分生成定向反馈)和优化器(包含反馈)。为了确保足够的迭代,我们重新评估了更新的解决方案,并迭代启动进一步的优化轮数。我们在Llama-3-8B和GPT-3.5上评估了MAgICoRe,并证明了其在5个数学数据集上的有效性。即使在只使用一半样本的情况下,MAgICoRe的自我一致性比基线提高了3.4%,最佳推理顺序比基线提高了3.2%,自修复提高了4.0%。与基线迭代方法不同,MAgICoRe在更多迭代后继续改进。最后,我们的实验表明,MAgICoRe的RM和多代理器通信非常重要。
https://arxiv.org/abs/2409.12147
With the ever-growing complexity of models in the field of remote sensing (RS), there is an increasing demand for solutions that balance model accuracy with computational efficiency. Knowledge distillation (KD) has emerged as a powerful tool to meet this need, enabling the transfer of knowledge from large, complex models to smaller, more efficient ones without significant loss in performance. This review article provides an extensive examination of KD and its innovative applications in RS. KD, a technique developed to transfer knowledge from a complex, often cumbersome model (teacher) to a more compact and efficient model (student), has seen significant evolution and application across various domains. Initially, we introduce the fundamental concepts and historical progression of KD methods. The advantages of employing KD are highlighted, particularly in terms of model compression, enhanced computational efficiency, and improved performance, which are pivotal for practical deployments in RS scenarios. The article provides a comprehensive taxonomy of KD techniques, where each category is critically analyzed to demonstrate the breadth and depth of the alternative options, and illustrates specific case studies that showcase the practical implementation of KD methods in RS tasks, such as instance segmentation and object detection. Further, the review discusses the challenges and limitations of KD in RS, including practical constraints and prospective future directions, providing a comprehensive overview for researchers and practitioners in the field of RS. Through this organization, the paper not only elucidates the current state of research in KD but also sets the stage for future research opportunities, thereby contributing significantly to both academic research and real-world applications.
随着遥感领域(RS)模型日益复杂,人们越来越需要兼顾模型准确性与计算效率的解决方案。知识蒸馏(KD)作为一种强大的工具,满足了这个需求,使大型、复杂模型的知识可以转移到更小、更高效的模型,而不会牺牲太多性能。本文回顾了KD及其在RS领域的创新应用。 首先,我们介绍了KD方法的基本概念和历史演变。KD的优点,特别是在模型压缩、增强计算效率和提高性能方面,得到了突出强调,这些优势对RS场景的实用部署至关重要。 文章全面梳理了KD技术,对每个类别进行了深入分析,以展示其广度和深度,并举例展示了KD方法在RS任务(如实例分割和目标检测)的实际应用。 此外,本文讨论了KD在RS中的挑战和局限性,包括实际限制和未来的研究方向,为RS领域的研究人员和实践者提供了全面概述。 通过这种组织,本文不仅阐明了KD在RS领域的现状,还为未来的研究机会奠定了基础,从而对学术研究和实际应用都做出了重要贡献。
https://arxiv.org/abs/2409.12111
The more than 200,000 glaciers outside the ice sheets play a crucial role in our society by influencing sea-level rise, water resource management, natural hazards, biodiversity, and tourism. However, only a fraction of these glaciers benefit from consistent and detailed in-situ observations that allow for assessing their status and changes over time. This limitation can, in part, be overcome by relying on satellite-based Earth Observation techniques. Satellite-based glacier mapping applications have historically mainly relied on manual and semi-automatic detection methods, while recently, a fast and notable transition to deep learning techniques has started. This chapter reviews how combining multi-sensor remote sensing data and deep learning allows us to better delineate (i.e. map) glaciers and detect their temporal changes. We explain how relying on deep learning multi-sensor frameworks to map glaciers benefits from the extensive availability of regional and global glacier inventories. We also analyse the rationale behind glacier mapping, the benefits of deep learning methodologies, and the inherent challenges in integrating multi-sensor earth observation data with deep learning algorithms. While our review aims to provide a broad overview of glacier mapping efforts, we highlight a few setups where deep learning multi-sensor remote sensing applications have a considerable potential added value. This includes applications for debris-covered and rock glaciers that are visually difficult to distinguish from surroundings and for calving glaciers that are in contact with the ocean. These specific cases are illustrated through a series of visual imageries, highlighting some significant advantages and challenges when detecting glacier changes, including dealing with seasonal snow cover, changing debris coverage, and distinguishing glacier fronts from the surrounding sea ice.
超过200,000座外冰川在影响着我们的社会,通过影响海平面上升、水资源管理、自然灾害、生物多样性以及旅游等方面发挥着关键作用。然而,只有少数这些冰川从持续且详细的现状观测中受益,这些观测可以评估它们的变化。这一局限部分可以通过依赖卫星遥感的地球观测技术来克服。卫星遥感的冰川映射应用历史上主要依赖于手动和半自动检测方法,而最近,随着深度学习技术的发展,快速且显著的向深度学习方法的转变开始出现。本章回顾了如何将多传感器遥感数据与深度学习相结合,使我们能够更好地描绘(即绘制)冰川并检测其时间变化。我们解释了依赖深度学习多传感器框架绘制冰川的优势,以及深度学习方法论的优势,以及将多传感器地球观测数据与深度学习算法整合的固有挑战。虽然我们的回顾旨在提供对冰川绘图努力的大致概述,但我们将重点关注深度学习多传感器遥感应用在具有相当潜在价值的一些设置。这些设置包括用于覆盖碎石和岩屑冰川以及与海洋接触的脱盐冰川的应用。这些特定案例通过一系列图像强调了在检测冰川变化时的一些显著优势和挑战,包括应对季节性积雪覆盖、改变碎石覆盖以及区分冰川前沿与周围海冰。
https://arxiv.org/abs/2409.12034
The use of data-driven methods in fluid mechanics has surged dramatically in recent years due to their capacity to adapt to the complex and multi-scale nature of turbulent flows, as well as to detect patterns in large-scale simulations or experimental tests. In order to interpret the relationships generated in the models during the training process, numerical attributions need to be assigned to the input features. One important example are the additive-feature-attribution methods. These explainability methods link the input features with the model prediction, providing an interpretation based on a linear formulation of the models. The SHapley Additive exPlanations (SHAP values) are formulated as the only possible interpretation that offers a unique solution for understanding the model. In this manuscript, the additive-feature-attribution methods are presented, showing four common implementations in the literature: kernel SHAP, tree SHAP, gradient SHAP, and deep SHAP. Then, the main applications of the additive-feature-attribution methods are introduced, dividing them into three main groups: turbulence modeling, fluid-mechanics fundamentals, and applied problems in fluid dynamics and heat transfer. This review shows thatexplainability techniques, and in particular additive-feature-attribution methods, are crucial for implementing interpretable and physics-compliant deep-learning models in the fluid-mechanics field.
近年来,在流体力学中使用数据驱动方法的数量急剧增加,这是因为他们能够适应湍流流动的复杂和多尺度特性,以及在大规模模拟或实验测试中检测模式的能力。为了解释训练过程中模型中产生的关系,需要对输入特征进行数值归因。一个重要的例子是增广特征归因方法。这些可解释性方法将输入特征与模型预测相连接,提供基于模型线性表示的解释。Shapley Additive Explanations (SHAP) 形式的增广特征归因方法被表述为唯一可能的解释,它提供了解释模型的独特解决方案。 在本文中,我们介绍了四种在文献中常见的增广特征归因方法:核 SHAP、树 SHAP、梯度 SHAP 和深度 SHAP。然后,我们介绍了这些增广特征归因方法的主要应用,将它们分为三个主要组:湍流建模、流体力学基础和流体动力学和热传递应用问题。 这个综述表明,可解释性技术和特别是增广特征归因方法在流体力学领域至关重要,为实现可解释和符合物理规律的深度学习模型奠定了基础。
https://arxiv.org/abs/2409.11992
Recently, AI systems have made remarkable progress in various tasks. Deep Reinforcement Learning(DRL) is an effective tool for agents to learn policies in low-level state spaces to solve highly complex tasks. Researchers have introduced Intrinsic Motivation(IM) to the RL mechanism, which simulates the agent's curiosity, encouraging agents to explore interesting areas of the environment. This new feature has proved vital in enabling agents to learn policies without being given specific goals. However, even though DRL intelligence emerges through a sub-symbolic model, there is still a need for a sort of abstraction to understand the knowledge collected by the agent. To this end, the classical planning formalism has been used in recent research to explicitly represent the knowledge an autonomous agent acquires and effectively reach extrinsic goals. Despite classical planning usually presents limited expressive capabilities, PPDDL demonstrated usefulness in reviewing the knowledge gathered by an autonomous system, making explicit causal correlations, and can be exploited to find a plan to reach any state the agent faces during its experience. This work presents a new architecture implementing an open-ended learning system able to synthesize from scratch its experience into a PPDDL representation and update it over time. Without a predefined set of goals and tasks, the system integrates intrinsic motivations to explore the environment in a self-directed way, exploiting the high-level knowledge acquired during its experience. The system explores the environment and iteratively: (a) discover options, (b) explore the environment using options, (c) abstract the knowledge collected and (d) plan. This paper proposes an alternative approach to implementing open-ended learning architectures exploiting low-level and high-level representations to extend its knowledge in a virtuous loop.
近年来,AI系统在各种任务上取得了显著的进步。深度强化学习(DRL)是一种有效的工具,使智能体在低级状态空间中学习策略,以解决高度复杂的任务。研究人员引入了内生动机(IM)到强化学习(RL)机制中,模拟了智能体的好奇心,鼓励智能体探索环境中的有趣区域。这种新特性已经在使智能体在没有具体目标的情况下学习策略方面证明至关重要。然而,尽管DRL智能是通过子符号模型出现的,但仍然需要某种抽象来理解智能体收集到的知识。为此,在最近的研究中,经典规划形式被用于明确表示智能体获得的知識,并有效地达到外部的目标。尽管经典规划通常具有有限的表达能力,但PPDDL在回顾智能体收集到的知识以及明确因果关系方面表现出了有效性,并可以被用于找到智能体在经历其经验时面临的任何状态的规划方案。这项工作提出了一种新的架构,实现了一个自定义的学习系统,可以从零开始合成其经验并随时间更新。在没有预定义的目标和任务的情况下,系统通过内生动机以自导向的方式探索环境,利用其在经验中获得的先进知识。系统探索环境并递归执行:(a)发现选项,(b) 使用选项探索环境,(c) 抽象收集到的知识,(d) 规划。本文提出了利用低级和高级表示来扩展其知识以实现美德循环的另一种实现开放性学习架构的方法。
https://arxiv.org/abs/2409.11756
In recent years, Light Detection and Ranging (LiDAR) technology, a critical sensor in robotics and autonomous systems, has seen significant advancements. These improvements include enhanced resolution of point clouds and the capability to provide 360° low-resolution images. These images encode various data such as depth, reflectivity, and near-infrared light within the pixels. However, an excessive density of points and conventional point cloud sampling can be counterproductive, particularly in applications such as LiDAR odometry, where misleading points and degraded geometry information may induce drift errors. Currently, extensive research efforts are being directed towards leveraging LiDAR-generated images to improve situational awareness. This paper presents a comprehensive review of current deep learning (DL) techniques, including colorization and super-resolution, which are traditionally utilized in conventional computer vision tasks. These techniques are applied to LiDAR-generated images and are analyzed qualitatively. Based on this analysis, we have developed a novel approach that selectively integrates the most suited colorization and super-resolution methods with LiDAR imagery to sample reliable points from the LiDAR point cloud. This approach aims to not only improve the accuracy of point cloud registration but also avoid mismatching caused by lacking geometry information, thereby augmenting the utility and precision of LiDAR systems in practical applications. In our evaluation, the proposed approach demonstrates superior performance compared to our previous work, achieving lower translation and rotation errors with a reduced number of points.
近年来,激光探测和测距(LiDAR)技术在机器人技术和自动驾驶系统中扮演着关键传感器的角色,取得了显著的进步。这些进步包括点云的高分辨率以及提供360°低分辨率图像的能力。这些图像在像素中编码各种数据,如深度、反射率和近红外光。然而,过度密集的点和传统的点云采样可能会产生反效果,尤其是在诸如LiDAR导航这样的应用中,误导性的点和失真的几何信息可能会引起漂移误差。目前,大量的研究努力集中在利用LiDAR生成的图像提高情境意识。本文对当前的深度学习(DL)技术进行了全面的回顾,包括颜色化和超分辨率,这些技术在传统的计算机视觉任务中得到了传统的应用。这些技术应用于LiDAR生成的图像并进行了定性的分析。根据这个分析,我们开发了一种新方法,将最合适的颜色化和超分辨率方法与LiDAR图像集成,以从LiDAR点云中采样可靠的点。这种方法旨在不仅提高点云配准的准确性,还避免由于缺乏几何信息而产生的匹配误差,从而提高了LiDAR系统在实际应用中的效用和精度。在我们的评估中,与我们的以前工作相比,所提出的方法表现出卓越的性能,通过减少点数实现了较低的平移和旋转误差。
https://arxiv.org/abs/2409.11532
In this manuscript I present an analysis on the performance of OpenAI O1-preview model in solving random K-SAT instances for K$\in {2,3,4}$ as a function of $\alpha=M/N$ where $M$ is the number of clauses and $N$ is the number of variables of the satisfiable problem. I show that the model can call an external SAT solver to solve the instances, rather than solving them directly. Despite using external solvers, the model reports incorrect assignments as output. Moreover, I propose and present an analysis to quantify whether the OpenAI O1-preview model demonstrates a spark of intelligence or merely makes random guesses when outputting an assignment for a Boolean satisfiability problem.
在本文手稿中,我对OpenAI O1-预览模型在解决随机K-SAT实例中的性能进行了分析,其中K$\in {2,3,4}$,作为α=M/N的函数。我证明了模型可以调用外部SAT求解器来解决实例,而不是直接解决它们。尽管使用了外部求解器,但模型在输出解时报告了错误的分配。此外,我提出了并展示了分析来衡量OpenAI O1-预览模型在输出布尔满足性问题解时是否表现出了一丝智能,或者只是随机猜测。
https://arxiv.org/abs/2409.11232
One application area of long-term memory (LTM) capabilities with increasing traction is personal AI companions and assistants. With the ability to retain and contextualize past interactions and adapt to user preferences, personal AI companions and assistants promise a profound shift in how we interact with AI and are on track to become indispensable in personal and professional settings. However, this advancement introduces new challenges and vulnerabilities that require careful consideration regarding the deployment and widespread use of these systems. The goal of this paper is to explore the broader implications of building and deploying personal AI applications with LTM capabilities using a holistic evaluation approach. This will be done in three ways: 1) reviewing the technological underpinnings of LTM in Large Language Models, 2) surveying current personal AI companions and assistants, and 3) analyzing critical considerations and implications of deploying and using these applications.
随着长时记忆(LTM)能力的不断提高,个人人工智能(AI)伴侣和助手领域的一个应用领域是个人人工智能助手。通过保留和上下文化过去的互动并适应用户偏好,个人人工智能助手和助手有望彻底改变我们与人工智能的互动方式,并有望在个人和专业场所变得不可或缺。然而,这一进步也引入了新的挑战和漏洞,需要仔细考虑关于这些系统的部署和广泛使用。本文的目标是探讨使用全面评估方法构建和部署具有LTM能力的个人人工智能应用程序的意义。本文将采取以下方式: 1)回顾Large Language Models中LTM的技术基础, 2)调查当前的个人人工智能助手和助手, 3)分析部署和使用这些应用程序的关键考虑因素和影响。
https://arxiv.org/abs/2409.11192
Diffusion models have achieved remarkable progress in generative modelling, particularly in enhancing image quality to conform to human preferences. Recently, these models have also been applied to low-level computer vision for photo-realistic image restoration (IR) in tasks such as image denoising, deblurring, dehazing, etc. In this review paper, we introduce key constructions in diffusion models and survey contemporary techniques that make use of diffusion models in solving general IR tasks. Furthermore, we point out the main challenges and limitations of existing diffusion-based IR frameworks and provide potential directions for future work.
扩散模型在生成建模方面取得了显著的进步,特别是在提高图像质量以符合人类偏好方面。最近,这些模型还应用于低级计算机视觉中的 photo-realistic 图像修复(IR)任务,例如图像去噪、去雾等。在本文综述论文中,我们介绍了扩散模型的关键构建,并调查了使用扩散模型解决一般 IR 任务的当代技术。此外,我们指出了现有基于扩散的 IR 框架的主要挑战和局限性,并为未来的工作提供了潜在方向。
https://arxiv.org/abs/2409.10353
Retrieval-Augmented Generation (RAG) has quickly grown into a pivotal paradigm in the development of Large Language Models (LLMs). While much of the current research in this field focuses on performance optimization, particularly in terms of accuracy and efficiency, the trustworthiness of RAG systems remains an area still under exploration. From a positive perspective, RAG systems are promising to enhance LLMs by providing them with useful and up-to-date knowledge from vast external databases, thereby mitigating the long-standing problem of hallucination. While from a negative perspective, RAG systems are at the risk of generating undesirable contents if the retrieved information is either inappropriate or poorly utilized. To address these concerns, we propose a unified framework that assesses the trustworthiness of RAG systems across six key dimensions: factuality, robustness, fairness, transparency, accountability, and privacy. Within this framework, we thoroughly review the existing literature on each dimension. Additionally, we create the evaluation benchmark regarding the six dimensions and conduct comprehensive evaluations for a variety of proprietary and open-source models. Finally, we identify the potential challenges for future research based on our investigation results. Through this work, we aim to lay a structured foundation for future investigations and provide practical insights for enhancing the trustworthiness of RAG systems in real-world applications.
检索增强生成(RAG)已经迅速成为大型语言模型(LLM)发展的关键范式。尽管该领域目前的大部分研究都集中在性能优化,尤其是在准确性和效率方面,但RAG系统的可靠性仍是一个有待探索的领域。从积极的角度来看,RAG系统有望通过提供它们来自广泛外部数据库的有用和最新知识来增强LLM,从而缓解长期存在的幻觉问题。然而,从消极的角度来看,如果检索到的信息不合适或没有充分利用,RAG系统有生成不良内容的危险。为了应对这些担忧,我们提出了一个统一框架,评估RAG系统的六个关键维度:事实性、稳健性、公平性、透明度、责任和隐私。在这个框架内,我们详细审查了每个维度的现有文献。此外,我们还为六个维度创建了评估基准,并对各种 proprietary(特权)和 open-source(开源)模型进行了全面评估。最后,根据我们的调查结果,我们确定了未来研究的潜在挑战。通过这项工作,我们旨在为未来的研究奠定结构化的基础,并为提高RAG系统的可靠性提供实际见解。
https://arxiv.org/abs/2409.10102
ChatGPT has long been proven to be effective in automatic program repair (APR). With the continuous iterations and upgrades of the ChatGPT version, its performance in terms of fixes has already reached state-of-the-art levels. However, there are few works comparing the effectiveness and variations of different versions of ChatGPT on APR. In this work, we evaluate the performance of the latest version of ChatGPT (O1-preview and O1-mini), ChatGPT-4o, and historical version of ChatGPT on APR. We study the improvements of the O1 model over traditional ChatGPT in terms of APR from multiple perspectives (repair success rate, repair cost, behavior patterns), and find that O1's repair capability exceeds that of traditional ChatGPT, successfully fixing all 40 bugs in the benchmark. Our work can serve as a reference for further in-depth exploration of the applications of ChatGPT in APR.
ChatGPT 已经证明在自动程序修复(APR)方面非常有效。随着 ChatGPT 版本的持续迭代和升级,它在修复方面的表现已经达到了最先进水平。然而,很少有关于 ChatGPT 不同版本在 APR 上的效果和优劣进行比较的研究。在这项工作中,我们评估了最新版本 ChatGPT(O1-preview 和 O1-mini)以及 ChatGPT-4o 和历史版本 ChatGPT在 APR 上的表现。我们从多个角度(修复成功率、修复费用、行为模式)研究了 O1 模型与传统 ChatGPT 在 APR 上的改进情况,并发现 O1 的修复能力超过传统 ChatGPT,成功地修复了基准测试中的所有 40 个 bug。我们的工作可以为深入研究 ChatGPT 在 APR 应用提供参考。
https://arxiv.org/abs/2409.10033
This paper provides a comprehensive survey of sentiment analysis within the context of artificial intelligence (AI) and large language models (LLMs). Sentiment analysis, a critical aspect of natural language processing (NLP), has evolved significantly from traditional rule-based methods to advanced deep learning techniques. This study examines the historical development of sentiment analysis, highlighting the transition from lexicon-based and pattern-based approaches to more sophisticated machine learning and deep learning models. Key challenges are discussed, including handling bilingual texts, detecting sarcasm, and addressing biases. The paper reviews state-of-the-art approaches, identifies emerging trends, and outlines future research directions to advance the field. By synthesizing current methodologies and exploring future opportunities, this survey aims to understand sentiment analysis in the AI and LLM context thoroughly.
本文对人工智能(AI)和大语言模型(LLM)背景下情感分析的全面调查进行了概述。情感分析是自然语言处理(NLP)的关键方面,从传统基于规则的方法演变为先进的深度学习技术。本研究探讨了情感分析的历史发展,重点关注从词汇和模式方法向更复杂的人工智能和深度学习模型的转变。本文讨论了关键挑战,包括处理双语文本、检测讽刺和解决偏见等问题。本文回顾了最先进的方法,识别了新兴趋势,并概述了未来研究的方向,以推动该领域的发展。通过合成现有的方法和探索未来的机遇,本文的调查旨在全面了解情感分析在AI和LLM背景下的应用。
https://arxiv.org/abs/2409.09989
Coronary artery calcium (CAC) is highly predictive of cardiovascular events. While millions of chest CT scans are performed annually in the United States, CAC is not routinely quantified from scans done for non-cardiac purposes. A deep learning algorithm was developed using 446 expert segmentations to automatically quantify CAC on non-contrast, non-gated CT scans (AI-CAC). Our study differs from prior works as we leverage imaging data across the Veterans Affairs national healthcare system, from 98 medical centers, capturing extensive heterogeneity in imaging protocols, scanners, and patients. AI-CAC performance on non-gated scans was compared against clinical standard ECG-gated CAC scoring. Non-gated AI-CAC differentiated zero vs. non-zero and less than 100 vs. 100 or greater Agatston scores with accuracies of 89.4% (F1 0.93) and 87.3% (F1 0.89), respectively, in 795 patients with paired gated scans within a year of a non-gated CT scan. Non-gated AI-CAC was predictive of 10-year all-cause mortality (CAC 0 vs. >400 group: 25.4% vs. 60.2%, Cox HR 3.49, p < 0.005), and composite first-time stroke, MI, or death (CAC 0 vs. >400 group: 33.5% vs. 63.8%, Cox HR 3.00, p < 0.005). In a screening dataset of 8,052 patients with low-dose lung cancer-screening CTs (LDCT), 3,091/8,052 (38.4%) individuals had AI-CAC >400. Four cardiologists qualitatively reviewed LDCT images from a random sample of >400 AI-CAC patients and verified that 527/531 (99.2%) would benefit from lipid-lowering therapy. To the best of our knowledge, this is the first non-gated CT CAC algorithm developed across a national healthcare system, on multiple imaging protocols, without filtering intra-cardiac hardware, and compared against a strong gated CT reference. We report superior performance relative to previous CAC algorithms evaluated against paired gated scans that included patients with intra-cardiac hardware.
冠状动脉钙(CAC)在预测心血管疾病事件方面具有高度的预测性。 虽然每年在美国进行数百万次胸部CT扫描,但通常不从非心脏相关目的的扫描中对CAC进行常规定量。使用446位专家分割开发了一个深度学习算法,用于在非对比、无门控的CT扫描(AI-CAC)上自动定量CAC。我们的研究不同于以往的工作,因为我们利用了美国国立卫生保健系统的成像数据,从98个医疗中心收集数据,涵盖了成像协议、扫描器和患者的广泛异质性。AI-CAC在非门控扫描上的性能与临床标准ECG-gated CAC评分相比较。非门控AI-CAC分别将零和零与非零以及小于100与大于100或更大的Agatston分数区分为89.4%(F1 0.93)和87.3%(F1 0.89), respectively,在一年内进行CT扫描的795名患者中。非门控AI-CAC预测了10年的全因死亡(CAC 0与>400组:25.4% vs. 60.2%,Cox HR 3.49,p < 0.005),以及组合首次中风、MI或死亡(CAC 0与>400组:33.5% vs. 63.8%,Cox HR 3.00,p < 0.005)。在8,052名低剂量肺癌筛查CT患者(LDCT)的筛查数据集中,有3,091名(38.4%)患者AI-CAC>400。四名心内科医生从随机样本>400名AI-CAC患者中审查LDCT图像,并确认527名(531名患者)受益于降脂治疗。据我们所知,这是首次在全国卫生系统中开发的无门控CT CAC算法,在多个成像协议上进行,没有过滤心腔硬件,并将其与强大的有门控CT参考进行比较。我们报道的相对以前CAC算法在双门控扫描中包括心腔硬件的患者的优越性能。
https://arxiv.org/abs/2409.09968
Graph anomaly detection (GAD), which aims to identify unusual graph instances (nodes, edges, subgraphs, or graphs), has attracted increasing attention in recent years due to its significance in a wide range of applications. Deep learning approaches, graph neural networks (GNNs) in particular, have been emerging as a promising paradigm for GAD, owing to its strong capability in capturing complex structure and/or node attributes in graph data. Considering the large number of methods proposed for GNN-based GAD, it is of paramount importance to summarize the methodologies and findings in the existing GAD studies, so that we can pinpoint effective model designs for tackling open GAD problems. To this end, in this work we aim to present a comprehensive review of deep learning approaches for GAD. Existing GAD surveys are focused on task-specific discussions, making it difficult to understand the technical insights of existing methods and their limitations in addressing some unique challenges in GAD. To fill this gap, we first discuss the problem complexities and their resulting challenges in GAD, and then provide a systematic review of current deep GAD methods from three novel perspectives of methodology, including GNN backbone design, proxy task design for GAD, and graph anomaly measures. To deepen the discussions, we further propose a taxonomy of 13 fine-grained method categories under these three perspectives to provide more in-depth insights into the model designs and their capabilities. To facilitate the experiments and validation, we also summarize a collection of widely-used GAD datasets and empirical comparison. We further discuss multiple open problems to inspire more future high-quality research. A continuously updated repository for datasets, links to the codes of algorithms, and empirical comparison is available at this https URL.
近年来,随着其在各种应用中的重要性,图形异常检测(GAD)引起了越来越多的关注。深度学习方法,特别是图神经网络(GNNs)已成为GAD的有前景的范式,因为其在捕捉图形数据中复杂结构和/或节点属性方面具有很强的能力。在考虑基于GNN的GAD方法提出的众多方法之后,总结现有的GAD研究中的方法和发现至关重要,以便我们确定解决开放GAD问题的有效模型设计。为此,在这篇论文中,我们将对基于深度学习的GAD进行全面回顾。现有的GAD调查仅关注任务特定的讨论,这使得我们难以理解现有方法的 technical见解及其在解决GAD独特挑战方面的局限性。为了填补这一空白,我们首先讨论了GAD中的问题复杂性和相应挑战,然后从三种新的方法论角度系统地回顾了当前的深度GAD方法,包括GNN骨干设计、GAD代理任务设计和图异常度量。为了更深入地探讨这些讨论,我们还提出了一个13个细粒度方法分类的分类器,以提供对模型设计和其能力的更深入洞察。为了促进实验和验证,我们还汇总了一些广泛使用的GAD数据集以及实验比较。我们进一步讨论了多个未解决的问题,以激发未来高质量的研究。数据集、算法代码和实验比较的持续更新可在此链接中找到。
https://arxiv.org/abs/2409.09957
In this paper we present a new machine learning workflow with unsupervised learning techniques to identify domains within atomic force microscopy images obtained from polymer films. The goal of the workflow is to identify the spatial location of the two types of polymer domains with little to no manual intervention and calculate the domain size distributions which in turn can help qualify the phase separated state of the material as macrophase or microphase ordered or disordered domains. We briefly review existing approaches used in other fields, computer vision and signal processing that can be applicable for the above tasks that happen frequently in the field of polymer science and engineering. We then test these approaches from computer vision and signal processing on the AFM image dataset to identify the strengths and limitations of each of these approaches for our first task. For our first domain segmentation task, we found that the workflow using discrete Fourier transform or discrete cosine transform with variance statistics as the feature works the best. The popular ResNet50 deep learning approach from computer vision field exhibited relatively poorer performance in the domain segmentation task for our AFM images as compared to the DFT and DCT based workflows. For the second task, for each of 144 input AFM images, we then used an existing porespy python package to calculate the domain size distribution from the output of that image from DFT based workflow. The information and open source codes we share in this paper can serve as a guide for researchers in the polymer and soft materials fields who need ML modeling and workflows for automated analyses of AFM images from polymer samples that may have crystalline or amorphous domains, sharp or rough interfaces between domains, or micro or macrophase separated domains.
在本文中,我们提出了一个利用无监督学习技术来识别聚合物薄膜获得的原子力显微图像中的领域的新的机器学习工作流程。工作流程的目标是识别两种类型的聚合物域,且几乎不需要手动干预,并计算域大小分布,从而有助于鉴定材料的相分离状态,无论是微相还是粗相。我们简要回顾了计算机视觉和信号处理等领域中可应用于上述任务的现有方法。然后,我们对这些方法在AFM图像数据集上进行计算机视觉和信号处理进行了测试,以找出每个方法在我们第一个任务中的优缺点。 在我们的第一个领域分割任务中,我们发现使用离散傅里叶变换(DFT)或离散余弦变换(DCT)作为特征的 workflow 效果最佳。与计算机视觉领域中的热门 ResNet50 深度学习方法相比,我们的AFM图像在领域分割任务上的表现相对较差。 对于第二个任务,对于144个输入AFM图像,我们 then 使用现有的porespy python包从DFT based workflow的输出计算每个图像的域大小分布。本文中提供的信息以及开源代码可以作为研究人员在聚合物和软材料领域进行自动化分析AFM图像的ML建模和流程的指南。
https://arxiv.org/abs/2409.11438
Graph machine learning (GML) has been successfully applied across a wide range of tasks. Nonetheless, GML faces significant challenges in generalizing over out-of-distribution (OOD) data, which raises concerns about its wider applicability. Recent advancements have underscored the crucial role of causality-driven approaches in overcoming these generalization challenges. Distinct from traditional GML methods that primarily rely on statistical dependencies, causality-focused strategies delve into the underlying causal mechanisms of data generation and model prediction, thus significantly improving the generalization of GML across different environments. This paper offers a thorough review of recent progress in causality-involved GML generalization. We elucidate the fundamental concepts of employing causality to enhance graph model generalization and categorize the various approaches, providing detailed descriptions of their methodologies and the connections among them. Furthermore, we explore the incorporation of causality in other related important areas of trustworthy GML, such as explanation, fairness, and robustness. Concluding with a discussion on potential future research directions, this review seeks to articulate the continuing development and future potential of causality in enhancing the trustworthiness of graph machine learning.
图机器学习(GML)在各种各样的任务中已经取得了成功应用。然而,GML在泛化过外(OOD)数据方面面临重大挑战,这引起了对其更广泛应用的担忧。最近的研究表明,因果驱动方法在克服这些泛化挑战中发挥着关键作用。与主要依赖统计依赖的传统GML方法不同,因果导向策略深入研究数据生成和模型预测的潜在因果机制,从而显著改善了GML在不同环境中的泛化。本文对涉及因果性的GML泛化进展进行了全面的回顾。我们阐明了利用因果性增强图模型泛化的基本概念,对各种方法进行了分类,提供了它们方法的详细描述以及它们之间的联系。此外,我们探讨了将因果性融入其他重要可信GML领域(如解释、公平和鲁棒性)的可能性。最后,我们讨论了潜在的未来研究方向,旨在阐述因果性在增强图机器学习可信度方面的持续发展和未来潜力。
https://arxiv.org/abs/2409.09858
Causal inference has been a pivotal challenge across diverse domains such as medicine and economics, demanding a complicated integration of human knowledge, mathematical reasoning, and data mining capabilities. Recent advancements in natural language processing (NLP), particularly with the advent of large language models (LLMs), have introduced promising opportunities for traditional causal inference tasks. This paper reviews recent progress in applying LLMs to causal inference, encompassing various tasks spanning different levels of causation. We summarize the main causal problems and approaches, and present a comparison of their evaluation results in different causal scenarios. Furthermore, we discuss key findings and outline directions for future research, underscoring the potential implications of integrating LLMs in advancing causal inference methodologies.
因果推断在医学和经济学等领域一直是一个重要的挑战,需要将人类知识、数学推理和数据挖掘能力进行复杂集成。近年来自然语言处理(NLP)的进步,特别是大型语言模型的出现,为传统因果推断任务提供了有希望的机会。本文回顾了将大型语言模型(LLMs)应用于因果推断的最新进展,涵盖了不同因果层次的各个任务。我们总结了主要的因果问题和方法,并比较了它们在不同因果场景下的评估结果。此外,我们讨论了关键发现以及未来研究的方向,强调了将LLMs集成在促进因果推理方法论方面的潜在影响。
https://arxiv.org/abs/2409.09822
Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a language of life, enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learning and language models in analyzing microbiome and metagenomics data. We focus on problem formulations, necessary datasets, and the integration of language modeling techniques. We provide an extensive overview of protein/genomic language modeling and their contributions to microbiome studies. We also discuss applications such as novel viromics language modeling, biosynthetic gene cluster prediction, and knowledge integration for metagenomics studies.
近年来,在深度学习领域的进步,特别是大型语言模型(LLMs)的发展,对研究人员研究微生物群和代谢组数据产生了重大影响。微生物蛋白质和基因组序列,类似于自然语言,构成了生命语言,使得LLMs能够将这些复杂微生物群的数据用于提取有用的见解。在本文中,我们回顾了在分析微生物群和代谢组数据方面应用深度学习和语言模型的情况。我们重点关注问题陈述、必要数据集和语言建模技术的整合。我们还提供了对蛋白质/基因组语言建模的全面概述及其对微生物群研究的贡献。我们还讨论了诸如新病毒学语言建模、生物合成基因簇预测和知识整合等应用。
https://arxiv.org/abs/2409.10579
The aim of this paper is to analyze methods of flexible control in SDN networks and to propose a self-developed solution that will enable intelligent adaptation of SDN controller performance. This work aims not only to review existing solutions, but also to develop an approach that will increase the efficiency and adaptability of network management. The project uses a modern type of machine learning, Reinforcement Learning, which allows autonomous decisions of a network that learns based on its choices in a dynamically changing environment, which is most similar to the way humans learn. The solution aims not only to improve the network's performance, but also its flexibility and real-time adaptability - flexible traffic control.
本论文旨在分析SDN网络中的灵活控制方法,并提出一种自定义解决方案,以实现SDN控制器性能的智能适应。本工作不仅旨在回顾现有的解决方案,还将开发一种方法,以提高网络管理效率和可适应性。该项目使用了一种现代类型的机器学习技术,即强化学习,允许网络根据其基于环境的决策进行自主决策,这是最类似于人类学习的。解决方案的目标不仅是提高网络的性能,还要提高其灵活性和实时适应性 - 灵活流量控制。
https://arxiv.org/abs/2409.11436
Federated learning holds great potential for enabling large-scale healthcare research and collaboration across multiple centres while ensuring data privacy and security are not compromised. Although numerous recent studies suggest or utilize federated learning based methods in healthcare, it remains unclear which ones have potential clinical utility. This review paper considers and analyzes the most recent studies up to May 2024 that describe federated learning based methods in healthcare. After a thorough review, we find that the vast majority are not appropriate for clinical use due to their methodological flaws and/or underlying biases which include but are not limited to privacy concerns, generalization issues, and communication costs. As a result, the effectiveness of federated learning in healthcare is significantly compromised. To overcome these challenges, we provide recommendations and promising opportunities that might be implemented to resolve these problems and improve the quality of model development in federated learning with healthcare.
联邦学习在促进大规模医疗研究和跨多个中心进行合作方面具有巨大的潜力,同时确保数据隐私和安全不受损害。尽管许多最近的研究在医疗领域基于联邦学习的方法,但仍然不清楚哪些方法具有临床应用潜力。本文回顾了截至2024年5月的研究,描述了基于联邦学习的方法在医疗保健领域。经过深入的审查,我们发现绝大多数方法由于其方法论缺陷和/或潜在的偏见(包括隐私问题、泛化问题和沟通成本等)而不适用于临床使用。因此,联邦学习在医疗保健领域的有效性受到严重影响。为了克服这些挑战,我们提供了建议和有前景的机会,这些机会可能被实施以解决这些问题并改善与医疗保健相关的模型开发质量。
https://arxiv.org/abs/2409.09727