In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance. This survey offers a detailed exploration of the methodologies, applications, challenges, and forward-looking opportunities of LLMs within these high-stakes sectors. We highlight the instrumental role of LLMs in enhancing diagnostic and treatment methodologies in healthcare, innovating financial analytics, and refining legal interpretation and compliance strategies. Moreover, we critically examine the ethics for LLM applications in these fields, pointing out the existing ethical concerns and the need for transparent, fair, and robust AI systems that respect regulatory norms. By presenting a thorough review of current literature and practical applications, we showcase the transformative impact of LLMs, and outline the imperative for interdisciplinary cooperation, methodological advancements, and ethical vigilance. Through this lens, we aim to spark dialogue and inspire future research dedicated to maximizing the benefits of LLMs while mitigating their risks in these precision-dependent sectors. To facilitate future research on LLMs in these critical societal domains, we also initiate a reading list that tracks the latest advancements under this topic, which will be continually updated: \url{this https URL}.
在人工智能快速发展的领域,如金融、医疗和法律等领域,大型语言模型(LLMs)如GPT-3和GPT-4正在改变这些领域的格局:这些领域以依赖专业知识、具有挑战性的数据收集、高风险和高监管合规而闻名。这项调查详细探讨了LLMs在这些高风险领域中的方法论、应用、挑战和未来展望。我们强调LLM在提高医疗保健中的诊断和治疗方法、创新金融分析和优化法律解释和合规策略中的关键作用。此外,我们对这些领域中LLM应用的伦理问题进行了批判性分析,指出存在的伦理担忧以及需要透明、公正和强大的AI系统来尊重监管规范。通过全面回顾现有文献和实际应用,我们展示了LLM的变革性影响,并强调了跨学科合作、方法和伦理警惕的重要性。通过这一视角,我们旨在激发对话,并激励未来研究,最大限度地利用LLM的优势,同时减轻其风险在这些精准依赖的领域。为了促进未来关于LLM在这些关键社会领域的研究,我们还启动了一个跟踪最新进展的阅读列表,该列表将不断更新:\url{this <https://url.org>}.
https://arxiv.org/abs/2405.01769
Unmanned Aerial Vehicles (UAVs) have emerged as a transformative technology across diverse sectors, offering adaptable solutions to complex challenges in both military and civilian domains. Their expanding capabilities present a platform for further advancement by integrating cutting-edge computational tools like Artificial Intelligence (AI) and Machine Learning (ML) algorithms. These advancements have significantly impacted various facets of human life, fostering an era of unparalleled efficiency and convenience. Large Language Models (LLMs), a key component of AI, exhibit remarkable learning and adaptation capabilities within deployed environments, demonstrating an evolving form of intelligence with the potential to approach human-level proficiency. This work explores the significant potential of integrating UAVs and LLMs to propel the development of autonomous systems. We comprehensively review LLM architectures, evaluating their suitability for UAV integration. Additionally, we summarize the state-of-the-art LLM-based UAV architectures and identify novel opportunities for LLM embedding within UAV frameworks. Notably, we focus on leveraging LLMs to refine data analysis and decision-making processes, specifically for enhanced spectral sensing and sharing in UAV applications. Furthermore, we investigate how LLM integration expands the scope of existing UAV applications, enabling autonomous data processing, improved decision-making, and faster response times in emergency scenarios like disaster response and network restoration. Finally, we highlight crucial areas for future research that are critical for facilitating the effective integration of LLMs and UAVs.
无人机(UAVs)作为一种变革性的技术,已经出现在各种领域,为军事和民用领域提供了适应性的解决方案。它们不断扩大的能力为通过整合尖端的计算工具如人工智能(AI)和机器学习(ML)算法,进一步推动进步提供了平台。这些进步对人类生活产生了重大影响,推动了无与伦比的高效和便利的时期。大型语言模型(LLMs),是AI的关键组成部分,在部署环境中表现出惊人的学习和适应能力,表明了一种不断发展的智能形式,具有接近人类水平的能力。 本文探讨了将无人机(UAVs)和LLMs集成以推动自主系统开发的巨大潜力。我们全面回顾了LLM架构,评估其是否适合无人机集成。此外,我们总结了基于LLM的无人机架构的最新进展,并探讨了LLM在无人机框架中嵌入的新机会。值得注意的是,我们重点关注利用LLMs优化数据分析和决策过程,特别是增强无人机应用中的光谱感知和数据共享。 此外,我们研究了LLM集成如何扩大现有无人机应用的范围,实现自主数据处理、改进决策以及在紧急场景如灾难应对和网络恢复中的更快的响应时间。最后,我们强调了未来研究的关键领域,这些领域对于促进LLMs和UAV的有效整合至关重要。
https://arxiv.org/abs/2405.01745
Oversmoothing is a commonly observed challenge in graph neural network (GNN) learning, where, as layers increase, embedding features learned from GNNs quickly become similar/indistinguishable, making them incapable of differentiating network proximity. A GNN with shallow layer architectures can only learn short-term relation or localized structure information, limiting its power of learning long-term connection, evidenced by their inferior learning performance on heterophilous graphs. Tackling oversmoothing is crucial to harness deep-layer architectures for GNNs. To date, many methods have been proposed to alleviate oversmoothing. The vast difference behind their design principles, combined with graph complications, make it difficult to understand and even compare their difference in tackling the oversmoothing. In this paper, we propose ATNPA, a unified view with five key steps: Augmentation, Transformation, Normalization, Propagation, and Aggregation, to summarize GNN oversmoothing alleviation approaches. We first outline three themes to tackle oversmoothing, and then separate all methods into six categories, followed by detailed reviews of representative methods, including their relation to the ATNPA, and discussion about their niche, strength, and weakness. The review not only draws in-depth understanding of existing methods in the field, but also shows a clear road map for future study.
过度平滑是在图神经网络(GNN)学习中普遍观察到的挑战,随着层数的增加,从GNN中学到的嵌入特征会快速变得相似/难以区分,使它们无法区分网络的接近程度。具有浅层网络架构的GNN只能学习短期关系或局部结构信息,限制了其在学习长期连接方面的能力,这可以从其在异质图上的学习表现不佳中看出。解决过度平滑对于利用深度网络架构优化GNN至关重要。 迄今为止,许多方法已经提出了减轻过度平滑的方案。它们的设计原则之间的巨大差异,再加上网络复杂性,使得很难理解和甚至比较它们在减轻过度平滑方面的差异。在本文中,我们提出了ATNPA,一种统一的方法,包括五个关键步骤:增强、转换、归一化、传播和聚合,以概括减轻过度平滑的GNN方法。我们首先概述了三种应对过度平滑的主题,然后将所有方法分为六个类别,接着详细评述了代表方法,包括它们与ATNPA的关系以及它们的特色、优势和局限性。这次回顾不仅吸引了领域内现有方法的深入理解,而且为未来研究提供了明确的路线图。
https://arxiv.org/abs/2405.01663
Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as simplified mathematical models, compact environment storage, and continuous scene representations. Apart from computer vision, NeRF has also shown tremendous potential in the field of robotics. Thus, we create this survey to provide a comprehensive understanding of NeRF in the field of robotics. By exploring the advantages and limitations of NeRF, as well as its current applications and future potential, we hope to shed light on this promising area of research. Our survey is divided into two main sections: \textit{The Application of NeRF in Robotics} and \textit{The Advance of NeRF in Robotics}, from the perspective of how NeRF enters the field of robotics. In the first section, we introduce and analyze some works that have been or could be used in the field of robotics from the perception and interaction perspectives. In the second section, we show some works related to improving NeRF's own properties, which are essential for deploying NeRF in the field of robotics. In the discussion section of the review, we summarize the existing challenges and provide some valuable future research directions for reference.
精確的3D環境表示一直是计算机視覺和機器人學領域的一個長期目標。最近,神經隐式表示的出現為這個領域帶來了巨大的創新,因為隱式表示能夠實現許多功能。其中,神經射線場(NeRF)因為巨大的表示優勢,如簡化的數學模型、紧凑的環境存儲和連續的場景表示而引起了趨勢。在計算機視覺之外,NeRF在機器人領域也展現出了巨大的潛力。因此,我們編寫了這份調查,以提供對NeRF在機器人領域的全面了解。通過探索NeRF的優缺點以及其當前的應用和未來潛力,我們希望揭示這個有前景的研究領域。我們的調查分為兩個主要部分:\textit{NeRF在機器人領域的應用}和\textit{NeRF在機器人領域的進步},從NeRF進入機器人領域的角度進行探討。在第一部分,我們介紹並分析了從感知和互動角度可用于機器人領域的一些工作。在第二部分,我們展示了與改善NeRF本身的屬性有關的一些工作,這些屬性對於在機器人領域中部署NeRF至关重要。在評論部分,我們總結了現有的挑戰,並提供了一些有價值的未來研究方向作為參考。
https://arxiv.org/abs/2405.01333
Prompt engineering is crucial for harnessing the potential of large language models (LLMs), especially in the medical domain where specialized terminology and phrasing is used. However, the efficacy of prompt engineering in the medical domain remains to be explored. In this work, 114 recent studies (2022-2024) applying prompt engineering in medicine, covering prompt learning (PL), prompt tuning (PT), and prompt design (PD) are reviewed. PD is the most prevalent (78 articles). In 12 papers, PD, PL, and PT terms were used interchangeably. ChatGPT is the most commonly used LLM, with seven papers using it for processing sensitive clinical data. Chain-of-Thought emerges as the most common prompt engineering technique. While PL and PT articles typically provide a baseline for evaluating prompt-based approaches, 64% of PD studies lack non-prompt-related baselines. We provide tables and figures summarizing existing work, and reporting recommendations to guide future research contributions.
提示工程对于挖掘大型语言模型的(LLM)潜力非常重要,尤其是在医疗领域,其中使用了专门的术语和措辞。然而,在医疗领域中提示工程的有效性仍需探讨。在这篇工作中,我们回顾了114篇最近发表的(2022-2024)医学领域的提示工程论文,包括提示学习(PL)、提示调整(PT)和提示设计(PD)。其中,提示设计最为普遍(78篇)。在12篇论文中,PD、PL和PT术语被交替使用。ChatGPT是最常用的LLM,有7篇论文用于处理敏感的临床数据。 Chain-of-Thought成为最常用的提示工程技术。虽然PL和PT文章通常为基于提示的方法提供基准,但64%的PD研究缺乏非提示相关的基准。我们提供了总结现有工作的表格和图表,并报告了指导未来研究贡献的建议。
https://arxiv.org/abs/2405.01249
Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors. However, (potential) whistleblowers often fear or face retaliation, even when reporting anonymously. The specific content of their disclosures and their distinct writing style may re-identify them as the source. Legal measures, such as the EU WBD, are limited in their scope and effectiveness. Therefore, computational methods to prevent re-identification are important complementary tools for encouraging whistleblowers to come forward. However, current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity. They aim to mitigate identification risk by replacing typical high-risk words (such as person names and other NE labels) and combinations thereof with placeholders. Such an approach, however, is inadequate for the whistleblowing scenario since it neglects further re-identification potential in textual features, including writing style. Therefore, we propose, implement, and evaluate a novel classification and mitigation strategy for rewriting texts that involves the whistleblower in the assessment of the risk and utility. Our prototypical tool semi-automatically evaluates risk at the word/term level and applies risk-adapted anonymization techniques to produce a grammatically disjointed yet appropriately sanitized text. We then use a LLM that we fine-tuned for paraphrasing to render this text coherent and style-neutral. We evaluate our tool's effectiveness using court cases from the ECHR and excerpts from a real-world whistleblower testimony and measure the protection against authorship attribution (AA) attacks and utility loss statistically using the popular IMDb62 movie reviews dataset. Our method can significantly reduce AA accuracy from 98.81% to 31.22%, while preserving up to 73.1% of the original content's semantics.
举报举报对于确保公共和私营部门的高透明度和问责制至关重要。然而,(可能的)举报者经常担心或面临报复,即使他们匿名举报。他们举报内容的具体内容和独特的写作风格可能会使他们重新被识别为来源。法律措施,如欧盟举报机制(WBD),在范围和效果上有限。因此,计算方法防止重新识别是鼓励举报者举报的重要补充工具。然而,当前的文本消毒工具遵循一种一刀切的方法,对匿名性持过于狭隘的观点。它们试图通过用典型高风险词汇(如人物姓名和其他标签)替换来减轻识别风险。然而,这种方法在举报场景中是不够的,因为它忽略了文本特征中进一步的重新识别可能性,包括文体。因此,我们提出了一个新颖的分类和减轻策略,该策略让举报者在评估风险和效用时参与其中。我们的原型工具在词/短语级别评估风险,并应用风险适应的匿名化技术生成语法不连贯但适度消毒的文本。然后,我们使用一个我们微调用于复写的LLM来生成连贯且风格中性的文本。我们用IMDb62电影评论数据集中的片段来衡量我们工具的有效性,并评估其对作者归属攻击和效用损失的统计保护。我们的方法可以从98.81%的准确度降低到31.22%,同时保留原始内容的73.1%的语义。
https://arxiv.org/abs/2405.01097
Wildfires have significant impacts on global vegetation, wildlife, and humans. They destroy plant communities and wildlife habitats and contribute to increased emissions of carbon dioxide, nitrogen oxides, methane, and other pollutants. The prediction of wildfires relies on various independent variables combined with regression or machine learning methods. In this technical review, we describe the options for independent variables, data processing techniques, models, independent variables collinearity and importance estimation methods, and model performance evaluation metrics. First, we divide the independent variables into 4 aspects, including climate and meteorology conditions, socio-economical factors, terrain and hydrological features, and wildfire historical records. Second, preprocessing methods are described for different magnitudes, different spatial-temporal resolutions, and different formats of data. Third, the collinearity and importance evaluation methods of independent variables are also considered. Fourth, we discuss the application of statistical models, traditional machine learning models, and deep learning models in wildfire risk prediction. In this subsection, compared with other reviews, this manuscript particularly discusses the evaluation metrics and recent advancements in deep learning methods. Lastly, addressing the limitations of current research, this paper emphasizes the need for more effective deep learning time series forecasting algorithms, the utilization of three-dimensional data including ground and trunk fuel, extraction of more accurate historical fire point data, and improved model evaluation metrics.
野火对全球植被、野生动物和人类有着显著的影响。它们破坏了植物群落和野生动物栖息地,并导致二氧化碳、氮氧化物、甲烷和其他污染物的排放增加。野火的预测依赖于各种独立变量的组合,包括回归或机器学习方法。在本文的技术审查中,我们描述了独立变量的选项、数据处理技术、模型、独立变量相关性和重要性估计方法以及模型性能评估指标。首先,我们将独立变量分为四个方面,包括气候和气象条件、社会经济因素、地形和水文特征以及野火历史记录。其次,对于不同的规模、不同的空间-时间分辨率和不同的数据格式,描述了预处理方法。第三,还考虑了独立变量的相关性和重要性估计方法。第四,我们讨论了在野火风险预测中应用统计模型、传统机器学习模型和深度学习模型的应用。在本小节中,与其它综述相比,本文特别关注了评估指标和深度学习方法的最近进展。最后,针对当前研究的局限性,本文强调了需要更有效的深度学习时间序列预测算法、利用包括地面和树干燃料在内的三维数据以及提取更精确的历史火灾点数据,以及改进模型评估指标。
https://arxiv.org/abs/2405.01607
The availability of high-quality datasets is crucial for the development of behavior prediction algorithms in autonomous vehicles. This paper highlights the need for standardizing the use of certain datasets for motion forecasting research to simplify comparative analysis and proposes a set of tools and practices to achieve this. Drawing on extensive experience and a comprehensive review of current literature, we summarize our proposals for preprocessing, visualizing, and evaluation in the form of an open-sourced toolbox designed for researchers working on trajectory prediction problems. The clear specification of necessary preprocessing steps and evaluation metrics is intended to alleviate development efforts and facilitate the comparison of results across different studies. The toolbox is available at: this https URL.
高质量数据集的可用性对于自动驾驶车辆中行为预测算法的开发至关重要。本文强调了在运动预测研究中标准化使用某些数据集的必要性,以简化比较分析,并提出了一系列工具和做法来实现这一目标。我们综合了广泛的经验和对当前文献的全面回顾,以提供一个为研究轨迹预测问题而设计的开源工具箱。明确的数据预处理步骤和评估指标的定义旨在减轻开发负担,并促进不同研究之间的结果比较。该工具箱可在以下链接访问:https://this URL。
https://arxiv.org/abs/2405.00604
AI is revolutionizing MRI along the acquisition and processing chain. Advanced AI frameworks have been developed to apply AI in various successive tasks, such as image reconstruction, quantitative parameter map estimation, and image segmentation. Existing frameworks are often designed to perform tasks independently or are focused on specific models or datasets, limiting generalization. We introduce ATOMMIC, an open-source toolbox that streamlines AI applications for accelerated MRI reconstruction and analysis. ATOMMIC implements several tasks using DL networks and enables MultiTask Learning (MTL) to perform related tasks integrated, targeting generalization in the MRI domain. We first review the current state of AI frameworks for MRI through a comprehensive literature search and by parsing 12,479 GitHub repositories. We benchmark 25 DL models on eight publicly available datasets to present distinct applications of ATOMMIC on accelerated MRI reconstruction, image segmentation, quantitative parameter map estimation, and joint accelerated MRI reconstruction and image segmentation utilizing MTL. Our findings demonstrate that ATOMMIC is the only MTL framework with harmonized complex-valued and real-valued data support. Evaluations on single tasks show that physics-based models, which enforce data consistency by leveraging the physical properties of MRI, outperform other models in reconstructing highly accelerated acquisitions. Physics-based models that produce high reconstruction quality can accurately estimate quantitative parameter maps. When high-performing reconstruction models are combined with robust segmentation networks utilizing MTL, performance is improved in both tasks. ATOMMIC facilitates MRI reconstruction and analysis by standardizing workflows, enhancing data interoperability, integrating unique features like MTL, and effectively benchmarking DL models.
人工智能正在改变MRI获取和处理链。已经开发了高级AI框架,用于在各种连续的任务中应用AI,如图像重建、定量参数图估计和图像分割。现有的框架通常被设计为独立执行任务或专注于特定的模型或数据集,从而限制了其泛化能力。我们介绍了一个名为ATOMMIC的开源工具箱,用于加速MRI重建和分析,该工具箱使用DL网络来执行多个任务,并使多任务学习(MTL)能够集成相关任务,以提高在MRI领域的泛化能力。我们首先通过全面的文献搜索回顾了AI框架在MRI领域的现状,并通过解析12,479个GitHub仓库,对25个DL模型在8个公开可用的数据集上的性能进行了基准测试,以展示ATOMMIC在加速MRI重建、图像分割、定量参数图估计和联合加速MRI重建和图像分割方面的应用。我们的研究结果表明,ATOMMIC是唯一一个支持标准化复杂值和实值数据的MTL框架。在单任务评估中,利用MRI的物理特性来确保数据一致性的物理基础模型在其他模型的重建中优于其他模型。具有高重建质量的物理基础模型可以准确估计定量参数图。当高性能重建模型与使用MTL的鲁棒分割网络结合时,性能在两个任务上都得到了提高。ATOMMIC通过标准化工作流程、增强数据互操作性、集成独特的MTL功能和有效基准测试DL模型,促进了MRI重建和分析。
https://arxiv.org/abs/2404.19665
This survey presents an overview of methods for learning from video (LfV) in the context of reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large internet video datasets and, in the process, extracting foundational knowledge about the world's dynamics and physical human behaviour. Such methods hold great promise for developing general-purpose robots. We open with an overview of fundamental concepts relevant to the LfV-for-robotics setting. This includes a discussion of the exciting benefits LfV methods can offer (e.g., improved generalization beyond the available robot data) and commentary on key LfV challenges (e.g., challenges related to missing information in video and LfV distribution shifts). Our literature review begins with an analysis of video foundation model techniques that can extract knowledge from large, heterogeneous video datasets. Next, we review methods that specifically leverage video data for robot learning. Here, we categorise work according to which RL knowledge modality benefits from the use of video data. We additionally highlight techniques for mitigating LfV challenges, including reviewing action representations that address the issue of missing action labels in video. Finally, we examine LfV datasets and benchmarks, before concluding the survey by discussing challenges and opportunities in LfV. Here, we advocate for scalable approaches that can leverage the full range of available data and that target the key benefits of LfV. Overall, we hope this survey will serve as a comprehensive reference for the emerging field of LfV, catalysing further research in the area, and ultimately facilitating progress towards obtaining general-purpose robots.
本次调查对从视频(LfV)学习在强化学习(RL)和机器人领域的方法进行了概述。我们重点介绍可以扩展到大型互联网视频数据集的方法,并在此过程中提取关于世界动态和物理人类行为的基本知识。这些方法在发展通用机器人方面具有很大的潜力。我们首先概述了与LfV-机器人设置相关的基本概念。这包括讨论LfV方法可以提供的令人兴奋的益处(例如,超过可用机器人数据的更好的泛化能力)以及评论关键LfV挑战(例如,视频和LfV分布变化相关的信息缺失)。我们的文献综述从分析可以提取知识的大型、异质视频数据集的视频基础模型技术开始。接下来,我们回顾了专门利用视频数据进行机器人学习的方法。在这里,我们将工作按照RL知识模式利用视频数据的影响进行分类。此外,我们重点关注缓解LfV挑战的技术,包括回顾解决视频中的动作标签缺失问题的动作表示。最后,我们检查了LfV数据集和基准,然后通过讨论LfV的挑战和机遇来结束调查。在这里,我们倡导可扩展的方法,可以利用全部可用的数据,并针对LfV的关键好处进行目标。总体而言,我们希望这次调查将成为LfV新兴领域全面参考,催化该领域进一步的研究,并最终推动实现通用机器人的进步。
https://arxiv.org/abs/2404.19664
In recent years, Artificial Intelligence (AI) has been widely used in medicine, particularly in the analysis of medical imaging, which has been driven by advances in computer vision and deep learning methods. This is particularly important in overcoming the challenges posed by diseases such as Bone Metastases (BM), a common and complex malignancy of the bones. Indeed, there have been an increasing interest in developing Machine Learning (ML) techniques into oncologic imaging for BM analysis. In order to provide a comprehensive overview of the current state-of-the-art and advancements for BM analysis using artificial intelligence, this review is conducted with the accordance with PRISMA guidelines. Firstly, this review highlights the clinical and oncologic perspectives of BM and the used medical imaging modalities, with discussing their advantages and limitations. Then the review focuses on modern approaches with considering the main BM analysis tasks, which includes: classification, detection and segmentation. The results analysis show that ML technologies can achieve promising performance for BM analysis and have significant potential to improve clinician efficiency and cope with time and cost limitations. Furthermore, there are requirements for further research to validate the clinical performance of ML tools and facilitate their integration into routine clinical practice.
近年来,人工智能(AI)在医学领域得到了广泛应用,特别是在医学影像分析方面,这得益于计算机视觉和深度学习技术的进步。这对于克服骨转移(BM)等疾病的挑战具有重要意义。事实上,越来越多地关注于开发用于BM分析的机器学习(ML)技术。为了全面回顾使用人工智能对BM分析的现状和进展,本文根据PRISMA指南进行撰写。首先,本文突出了BM的临床和病理观点以及所使用的医学影像手段,并讨论了它们的优缺点。接着,重点关注了考虑主要BM分析任务的现代方法,包括分类、检测和分割。结果分析表明,ML技术在BM分析方面可以实现良好的性能,具有显著提高临床效率和应对时间和成本限制的潜力。此外,还有必要进一步研究验证ML工具的临床性能,并促进它们融入临床实践。
https://arxiv.org/abs/2404.19598
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at this https URL.
随着移动平台对计算摄影和图像需求的不断增加,相机系统中正广泛开发和集成先进的图像传感器以及新型算法。然而,高质量的研究数据稀缺,以及工业和学术界之间深入交流的罕见机会,限制了移动智能摄影和成像(MIPI)的发展。在以前MIPI研讨会和CVPR 2023的基础上,我们介绍了我们的第三届MIPI挑战,包括三个关注 novel image sensors 和 imaging algorithms 的 track。在本文中,我们总结了和回顾了MIPI 2024中的夜间闪光消除 track。在最终的测试阶段,共有170名参与者成功注册,14个团队提交了结果。这个挑战中开发出的解决方案在夜间闪光消除方面实现了最先进的性能。关于这个挑战以及与数据集的链接,详情请查看这个链接。
https://arxiv.org/abs/2404.19534
Imitation learning is an approach in which an agent learns how to execute a task by trying to mimic how one or more teachers perform it. This learning approach offers a compromise between the time it takes to learn a new task and the effort needed to collect teacher samples for the agent. It achieves this by balancing learning from the teacher, who has some information on how to perform the task, and deviating from their examples when necessary, such as states not present in the teacher samples. Consequently, the field of imitation learning has received much attention from researchers in recent years, resulting in many new methods and applications. However, with this increase in published work and past surveys focusing mainly on methodology, a lack of standardisation became more prominent in the field. This non-standardisation is evident in the use of environments, which appear in no more than two works, and evaluation processes, such as qualitative analysis, that have become rare in current literature. In this survey, we systematically review current imitation learning literature and present our findings by (i) classifying imitation learning techniques, environments and metrics by introducing novel taxonomies; (ii) reflecting on main problems from the literature; and (iii) presenting challenges and future directions for researchers.
模仿学习是一种方法,其中智能体通过尝试模仿一个或多个教师如何执行任务来学习如何执行任务。这种学习方法在学习和获取教师样本之间取得了妥协,既不会花费过多时间来学习新任务,也不会花费过多精力来收集教师样本。它是通过平衡从教师那里学习到的知识(教师有一些关于如何执行任务的简要信息)和必要时与教师例子保持一定距离(如不在教师样本中的状态)来实现的。因此,近年来,模仿学习领域已经得到了研究人员的高度关注,并产生了许多新方法和应用。然而,随着发表的作品数量增加和过去调查主要关注方法论,该领域的标准化问题变得更加突出。这种非标准化在环境和评估过程中尤为明显。在本文的调查中,我们系统地回顾了当前的模仿学习文献,并通过引入新的分类来呈现我们的研究结果。我们还反思了文献中提出的主要问题,并提出了研究人员未来需要关注的新挑战和方向。
https://arxiv.org/abs/2404.19456
Medicine and artificial intelligence (AI) engineering represent two distinct fields each with decades of published history. With such history comes a set of terminology that has a specific way in which it is applied. However, when two distinct fields with overlapping terminology start to collaborate, miscommunication and misunderstandings can occur. This narrative review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts, and offer solutions to mitigate misunderstandings by readers from either field. Through an examination of historical documents, including articles, writing guidelines, and textbooks, this review traces the divergent evolution of terms for data sets and their impact. Initially, the discordant interpretations of the word 'validation' in medical and AI contexts are explored. Then the data sets used for AI evaluation are classified, namely random splitting, cross-validation, temporal, geographic, internal, and external sets. The accurate and standardized description of these data sets is crucial for demonstrating the robustness and generalizability of AI applications in medicine. This review clarifies existing literature to provide a comprehensive understanding of these classifications and their implications in AI evaluation. This review then identifies often misunderstood terms and proposes pragmatic solutions to mitigate terminological confusion. Among these solutions are the use of standardized terminology such as 'training set,' 'validation (or tuning) set,' and 'test set,' and explicit definition of data set splitting terminologies in each medical AI research publication. This review aspires to enhance the precision of communication in medical AI, thereby fostering more effective and transparent research methodologies in this interdisciplinary field.
医学和人工智能(AI)工程分别具有数十年历史的两个独立领域。随着这种历史,会用到一系列特定的术语,这些术语有一种特定的应用方式。然而,当两个具有重叠术语的独立领域开始合作时,可能会发生误解和误解。本综述旨在为这些术语提供历史背景,强调在医学AI环境中使用这些术语时的重要性,并提供了解决读者来自各自领域误解的方法。通过研究历史文件,包括文章、写作指南和教科书,本综述探讨了数据集的演变及其影响。首先,探索了医学和AI环境中“验证”一词的异解。接着对AI评估所使用的数据集进行分类,即随机划分、交叉验证、时间、地理、内部和外部数据集。准确和标准化的描述这些数据集对展示AI应用程序在医学方面的稳健性和一般性至关重要。本综述澄清了现有文献,以提供对这类分类的全面了解及其在AI评估中的影响。接着,识别出常常被误解的术语,并提出了实用的解决方案以减轻术语混淆。这些解决方案包括使用标准化术语(如“训练集”、“验证(或调整)集”和“测试集”),以及在医疗AI研究出版物中明确定义数据集分割术语。本综述希望提高在医学AI领域的沟通精确度,从而促进跨学科领域更有效和透明的研究方法。
https://arxiv.org/abs/2404.19303
Parkinson's disease is a widespread neurodegenerative condition necessitating early diagnosis for effective intervention. This paper introduces an innovative method for diagnosing Parkinson's disease through the analysis of human EEG signals, employing a Support Vector Machine (SVM) classification model. this research presents novel contributions to enhance diagnostic accuracy and reliability. Our approach incorporates a comprehensive review of EEG signal analysis techniques and machine learning methods. Drawing from recent studies, we have engineered an advanced SVM-based model optimized for Parkinson's disease diagnosis. Utilizing cutting-edge feature engineering, extensive hyperparameter tuning, and kernel selection, our method achieves not only heightened diagnostic accuracy but also emphasizes model interpretability, catering to both clinicians and researchers. Moreover, ethical concerns in healthcare machine learning, such as data privacy and biases, are conscientiously addressed. We assess our method's performance through experiments on a diverse dataset comprising EEG recordings from Parkinson's disease patients and healthy controls, demonstrating significantly improved diagnostic accuracy compared to conventional techniques. In conclusion, this paper introduces an innovative SVM-based approach for diagnosing Parkinson's disease from human EEG signals. Building upon the IEEE framework and previous research, its novelty lies in the capacity to enhance diagnostic accuracy while upholding interpretability and ethical considerations for practical healthcare applications. These advances promise to revolutionize early Parkinson's disease detection and management, ultimately contributing to enhanced patient outcomes and quality of life.
帕金森病是一种广泛的神经退行性疾病,需要早期诊断以实现有效的干预。本文介绍了一种通过分析人类脑电信号来诊断帕金森病的新颖方法,并采用支持向量机(SVM)分类模型。这项研究为提高诊断准确性和可靠性做出了新的贡献。我们的方法综合了脑电信号分析技术和机器学习方法的全面回顾。从最近的研究中,我们设计了一个优化帕金森病诊断的先进SVM模型。通过采用尖端特征工程、广泛的超参数调整和核选择,我们的方法实现了不仅是诊断准确度的大幅提高,而且强调了模型的可解释性,适用于临床医生和研究人员。此外,本文还认真处理了医疗机器学习领域中的伦理问题,如数据隐私和偏见等。我们通过在帕金森病患者和健康对照者的脑电记录上进行实验,评估了本方法的表现,结果表明,与传统技术相比,其诊断准确度有显著提高。总之,本文介绍了一种基于SVM的诊断帕金森病的新颖方法,其独特之处在于在提高诊断准确性的同时保持可解释性和伦理考虑,为实际医疗应用提供了变革性的支持。这些进步有望彻底改变早期帕金森病检测和管理,最终为患者带来更好的治疗效果和生活质量。
https://arxiv.org/abs/2405.00741
The idea of using deep-learning-based molecular generation to accelerate discovery of drug candidates has attracted extraordinary attention, and many deep generative models have been developed for automated drug design, termed molecular generation. In general, molecular generation encompasses two main strategies: de novo design, which generates novel molecular structures from scratch, and lead optimization, which refines existing molecules into drug candidates. Among them, lead optimization plays an important role in real-world drug design. For example, it can enable the development of me-better drugs that are chemically distinct yet more effective than the original drugs. It can also facilitate fragment-based drug design, transforming virtual-screened small ligands with low affinity into first-in-class medicines. Despite its importance, automated lead optimization remains underexplored compared to the well-established de novo generative models, due to its reliance on complex biological and chemical knowledge. To bridge this gap, we conduct a systematic review of traditional computational methods for lead optimization, organizing these strategies into four principal sub-tasks with defined inputs and outputs. This review delves into the basic concepts, goals, conventional CADD techniques, and recent advancements in AIDD. Additionally, we introduce a unified perspective based on constrained subgraph generation to harmonize the methodologies of de novo design and lead optimization. Through this lens, de novo design can incorporate strategies from lead optimization to address the challenge of generating hard-to-synthesize molecules; inversely, lead optimization can benefit from the innovations in de novo design by approaching it as a task of generating molecules conditioned on certain substructures.
使用基于深度学习的分子生成来加速药物候选物发现的想法引起了非凡的关注,并为自动药物设计开发了许多深度生成模型,称为分子生成。通常,分子生成包括两种主要策略:从头设计(从零生成新分子结构)和lead优化(优化现有分子以成为药物候选人)。在它们中,lead优化在现实世界的药物设计中扮演着重要的角色。例如,它可以通过生成化学上与原始药物不同的但更有效的更好药物来开发。它还可以促进基于片段的药物设计,将具有低亲和力的虚拟筛选的小分子转化为第一类药物。尽管它在药物设计中具有重要地位,但与经过充分验证的从头生成模型相比,自动lead优化仍然缺乏研究,原因是它依赖于复杂的生物和化学知识。为了弥合这一差距,我们进行了一项系统性的回顾,回顾了传统计算方法在lead优化中的作用,将这些策略组织成四个明确定义的子任务。本审查深入探讨了基本概念、目标、传统的CADD技术和AIDD中 recent advances。此外,我们还引入了一个基于约束子图生成的统一视角,以协调从头设计和解lead优化的方法。通过这个视角,从头设计可以吸收来自lead优化的策略来解决生成难以合成化合物的挑战;相反,lead优化可以从头设计的创新中受益,将其视为生成一定子结构分子的任务。
https://arxiv.org/abs/2404.19230
With the rapid proliferation of artificial intelligence, there is growing concern over its potential to exacerbate existing biases and societal disparities and introduce novel ones. This issue has prompted widespread attention from academia, policymakers, industry, and civil society. While evidence suggests that integrating human perspectives can mitigate bias-related issues in AI systems, it also introduces challenges associated with cognitive biases inherent in human decision-making. Our research focuses on reviewing existing methodologies and ongoing investigations aimed at understanding annotation attributes that contribute to bias.
随着人工智能的快速发展,人们越来越担心其可能加剧现有的偏见和社会不平等,并引入新的偏见。这个问题已经引起了学术界、政策制定者、产业和民间社会的广泛关注。尽管证据表明,将人类视角融入人工智能系统可以减轻偏见相关问题,但这也带来了与人类决策中固有偏见相关的挑战。我们的研究重点在于回顾现有方法和正在进行的研究,以了解有助于理解注释属性的偏见。
https://arxiv.org/abs/2404.19071
This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs often generate outputs that are inconsistent with the visual content, a challenge known as hallucination, which poses substantial obstacles to their practical deployment and raises concerns regarding their reliability in real-world applications. This problem has attracted increasing attention, prompting efforts to detect and mitigate such inaccuracies. We review recent advances in identifying, evaluating, and mitigating these hallucinations, offering a detailed overview of the underlying causes, evaluation benchmarks, metrics, and strategies developed to address this issue. Additionally, we analyze the current challenges and limitations, formulating open questions that delineate potential pathways for future research. By drawing the granular classification and landscapes of hallucination causes, evaluation benchmarks, and mitigation methods, this survey aims to deepen the understanding of hallucinations in MLLMs and inspire further advancements in the field. Through our thorough and in-depth review, we contribute to the ongoing dialogue on enhancing the robustness and reliability of MLLMs, providing valuable insights and resources for researchers and practitioners alike. Resources are available at: this https URL.
本次调查对多模态大型语言模型(MMLMs)的现象进行了全面分析,这些模型也被称为大型视觉语言模型(LVLMs),在多模态任务中取得了显著的进步和非凡的能力。尽管有这些鼓舞人心的发展和显著的进步,但MMLMs通常生成的输出与视觉内容不一致,这是一种称为幻觉的挑战,这对其实际部署造成了实质性的障碍,并对其可靠性在现实世界应用中提出了担忧。这个问题吸引了越来越多的关注,促使人们努力检测和缓解这种不准确。我们回顾了最近在识别、评估和缓解这种幻觉方面的最新进展,提供了对这个问题背后的原因、评估基准、指标和策略的详细概述。此外,我们分析了当前的挑战和局限性,提出了开放性问题,勾勒出未来研究的潜在路径。通过深入研究幻觉的原因、评估基准和缓解方法,本次调查旨在加深人们对MMLMs幻觉的理解,并为该领域的进一步发展提供有益的见解和资源。资源可在此链接中获取:https://this.url
https://arxiv.org/abs/2404.18930
We introduce Holmes, a benchmark to assess the linguistic competence of language models (LMs) - their ability to grasp linguistic phenomena. Unlike prior prompting-based evaluations, Holmes assesses the linguistic competence of LMs via their internal representations using classifier-based probing. In doing so, we disentangle specific phenomena (e.g., part-of-speech of words) from other cognitive abilities, like following textual instructions, and meet recent calls to assess LMs' linguistic competence in isolation. Composing Holmes, we review over 250 probing studies and feature more than 200 datasets to assess syntax, morphology, semantics, reasoning, and discourse phenomena. Analyzing over 50 LMs reveals that, aligned with known trends, their linguistic competence correlates with model size. However, surprisingly, model architecture and instruction tuning also significantly influence performance, particularly in morphology and syntax. Finally, we propose FlashHolmes, a streamlined version of Holmes designed to lower the high computation load while maintaining high-ranking precision.
我们介绍Holmes,作为一种评估语言模型(LMs)语言能力的基准,衡量它们把握语言现象的能力。与先前的提示评估方法不同,Holmes使用分类器基于探测的方法评估LMs的语言能力。这样做,我们区分了具体现象(如词的词性)与其他认知能力(如遵循文本指令)并满足了对LMs语言能力的单独评估。组成Holmes后,我们审查了超过250个探测研究,并将超过200个数据集用于评估语法、语义、推理和会话现象。分析超过50个LMs后,发现,与已知趋势一致,它们的Linguistic competence与模型大小呈正相关。然而,令人惊讶的是,模型结构和指令调整也会显著影响性能,特别是在语义和语法方面。最后,我们提出了FlashHolmes,一种简化版的Holmes,旨在降低计算负担,同时保持高排名精度。
https://arxiv.org/abs/2404.18923
The study of time series data is crucial for understanding trends and anomalies over time, enabling predictive insights across various sectors. Spatio-temporal data, on the other hand, is vital for analyzing phenomena in both space and time, providing a dynamic perspective on complex system interactions. Recently, diffusion models have seen widespread application in time series and spatio-temporal data mining. Not only do they enhance the generative and inferential capabilities for sequential and temporal data, but they also extend to other downstream tasks. In this survey, we comprehensively and thoroughly review the use of diffusion models in time series and spatio-temporal data, categorizing them by model category, task type, data modality, and practical application domain. In detail, we categorize diffusion models into unconditioned and conditioned types and discuss time series data and spatio-temporal data separately. Unconditioned models, which operate unsupervised, are subdivided into probability-based and score-based models, serving predictive and generative tasks such as forecasting, anomaly detection, classification, and imputation. Conditioned models, on the other hand, utilize extra information to enhance performance and are similarly divided for both predictive and generative tasks. Our survey extensively covers their application in various fields, including healthcare, recommendation, climate, energy, audio, and transportation, providing a foundational understanding of how these models analyze and generate data. Through this structured overview, we aim to provide researchers and practitioners with a comprehensive understanding of diffusion models for time series and spatio-temporal data analysis, aiming to direct future innovations and applications by addressing traditional challenges and exploring innovative solutions within the diffusion model framework.
研究时间序列数据对于理解随时间变化的趋势和异常现象,以及跨各种行业的预测性见解至关重要。而空间-时间数据则对于分析空间和时间现象,以及复杂系统交互的动态视角具有至关重要的作用。最近,扩散模型在时间序列和空间-时间数据挖掘中得到了广泛应用。不仅它们能够增强序列和时间数据的生成和推断能力,而且它们还扩展到其他下游任务。在本次调查中,我们全面、深入地回顾了扩散模型在时间序列和空间-时间数据中的应用,并将它们按模型分类、任务类型、数据模态和实践应用领域进行分类。详细来说,我们将扩散模型分为有条件和支持性两种类型,并分别讨论时间序列数据和空间-时间数据。有条件模型 unsupervised 模型进一步细分为基于概率和基于评分的模型,为预测、异常检测、分类和反向工程等生成和推断任务提供支持。支持性模型另一方面则利用额外的信息来提高性能,并为预测和生成任务同样细分为有条件和无条件。我们的调查详细涵盖了它们在各个领域中的应用,包括医疗、推荐、气候、能源、音频和交通等,为研究人员和实践者提供了对扩散模型在时间序列和空间-时间数据分析方面的全面了解,旨在通过解决传统挑战和探索扩散模型框架内的创新解决方案,引导未来的创新和发展。
https://arxiv.org/abs/2404.18886