In the realm of robotics, the quest for achieving real-world autonomy, capable of executing large-scale and long-term operations, has positioned place recognition (PR) as a cornerstone technology. Despite the PR community's remarkable strides over the past two decades, garnering attention from fields like computer vision and robotics, the development of PR methods that sufficiently support real-world robotic systems remains a challenge. This paper aims to bridge this gap by highlighting the crucial role of PR within the framework of Simultaneous Localization and Mapping (SLAM) 2.0. This new phase in robotic navigation calls for scalable, adaptable, and efficient PR solutions by integrating advanced artificial intelligence (AI) technologies. For this goal, we provide a comprehensive review of the current state-of-the-art (SOTA) advancements in PR, alongside the remaining challenges, and underscore its broad applications in robotics. This paper begins with an exploration of PR's formulation and key research challenges. We extensively review literature, focusing on related methods on place representation and solutions to various PR challenges. Applications showcasing PR's potential in robotics, key PR datasets, and open-source libraries are discussed. We also emphasizes our open-source package, aimed at new development and benchmark for general PR. We conclude with a discussion on PR's future directions, accompanied by a summary of the literature covered and access to our open-source library, available to the robotics community at: this https URL.
在机器人领域,实现真实世界自主,能够执行大规模和长期任务的目标,使位置识别(PR)成为关键技术。尽管 PR 社区在过去的 20 年里取得了惊人的进展,从计算机视觉和机器人学等领域吸引了关注,但开发足够支持真实世界机器人系统的方法仍然具有挑战性。本文旨在通过强调 PR 在同步定位与映射(SLAM)2.0 框架中的关键作用来弥合这个空白。这一机器人导航的新阶段要求通过整合先进的人工智能(AI)技术,提供可扩展、可适应和高效的 PR 解决方案。为此,我们对 PR 领域的最前沿(SOTA)进展进行全面回顾,并讨论了其在机器人领域的广泛应用。本文从 PR 的定义和关键研究挑战开始。我们详细回顾了文献,重点关注与位置表示和各种 PR 挑战相关的相关方法。展示了 PR 在机器人领域的潜在应用、关键 PR 数据集和开源库。我们还强调了我们旨在为一般 PR 的新发展和基准提供支持的开放式源代码包。我们最后讨论了 PR 的未来方向,并附有文献回顾和开放式源代码库的链接,供机器人学术界使用:https:// this URL.
https://arxiv.org/abs/2405.04812
The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.
大规模语言模型的快速发展为利用人工智能在各个领域提供了新的机会,包括网络安全。随着网络威胁的数量和复杂性不断增加,人们越来越需要能够自动检测漏洞、分析恶意软件并应对攻击的智能系统。在本次调查中,我们对大规模语言模型在网络安全领域的应用进行全面回顾(LLM4Security)。通过全面收集超过3000篇相关论文并系统分析来自顶级安全和水下工程领域的127篇论文,我们希望为读者提供全面了解大规模语言模型在网络安全领域应用的视角。通过我们的分析,我们发现了几个关键发现。首先,我们观察到大规模语言模型被应用于广泛的网络安全任务,包括漏洞检测、恶意软件分析、网络入侵检测和网络钓鱼检测。其次,我们发现用于训练和评估大规模语言模型在这些任务中所用的数据集往往规模有限且多样性不足,强调了对更全面和代表性的数据集的需求。第三,我们识别出几种将大规模语言模型适应特定网络安全领域的有前途的方法,例如微调、迁移学习和领域特定的预训练。最后,我们讨论了在LLM4Security领域未来研究的主要挑战和机遇,包括需要更可解释和可解释的模型、解决数据隐私和安全问题的迫切需要以及利用大规模语言模型进行主动防御和威胁狩猎的可能性。总的来说,我们的调查为LLM4Security领域提供了全面回顾,并提出了几个有前途的研究方向。
https://arxiv.org/abs/2405.04760
Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.
目的:开发和评估一种基于Transformer的深度学习模型,用于从CT urography(CTU)检查中的无增强和urographic相合成肾脏相图像。材料和方法:本研究由地方机构审查委员会批准。共选择了119名患者(平均年龄±标准差,65±12岁;75/44名男性/女性)进行研究,共进行了三次CT urography检查。为了开发深度学习模型,将一组三个阶段的CT urography数据集进行了定制。每个患者的三个阶段都与一种平滑的配准算法对齐。开发了一个名为Residual Transformer模型(ResNCT)的定制模型,该模型使用非对比和urographic图像对之间的成对输入进行训练,以产生肾脏相图像,并与相应的真实肾脏相图像进行比较。合成的图像使用多个性能指标进行评估,包括峰值信号噪声比(PSNR)、结构相似性指数(SSIM)、均方误差(MAE)和根方均方误差(RMSE)。结果:ResNCT模型成功从非对比和urographic图像输入中合成出肾脏相图像。与真实肾脏相图像相比,该模型合成的图像在PSNR(27.8±2.7 dB)、SSIM(0.88±0.05)和NCC(0.98±0.02)方面具有较高的相似性,并且在MAE(0.02±0.005)和RMSE(0.042±0.016)方面具有较低的均方误差和方差。结论:ResNCT模型成功地合成了与真实肾脏相图像相似的肾脏相CT图像。ResNCT模型为CTU检查减少了33%的辐射剂量的获取提供了途径。
https://arxiv.org/abs/2405.04629
Artificial neural networks (ANNs) perform extraordinarily on numerous tasks including classification or prediction, e.g., speech processing and image classification. These new functions are based on a computational model that is enabled to select freely all necessary internal model parameters as long as it eventually delivers the functionality it is supposed to exhibit. Here, we review the connection between the model parameter selection in machine learning (ML) algorithms running on ANNs and the epistemological theory of neopragmatism focusing on the theory's utility and anti-representationalist aspects. To understand the consequences of the model parameter selection of an ANN, we suggest using neopragmatist theories whose implications are well studied. Incidentally, neopragmatism's notion of optimization is also based on utility considerations. This means that applying this approach elegantly reveals the inherent connections between optimization in ML, using a numerical method during the learning phase, and optimization in the ethical theory of consequentialism, where it occurs as a maxim of action. We suggest that these connections originate from the way relevance is calculated in ML systems. This could ultimately reveal a tendency for specific actions in ML systems.
人工智能神经网络(ANNs)在许多任务中表现出色,包括分类或预测,例如语音处理和图像分类。这些新功能基于一种计算模型,该模型允许在最终实现其预期功能时自由选择所有必要的内部模型参数。在这里,我们回顾了机器学习(ML)算法中模型参数选择与新格言主义理论之间的联系,重点关注其理论的实用性和反表现主义 aspects。为了了解ANN模型参数选择的影响,我们建议使用研究 implications 好的新格言主义理论。值得注意的是,新格言主义的优化概念也是基于效用考虑的。这意味着应用这种方法恰当地揭示了在机器学习中的优化问题,以及在伦理后果理论中的优化问题,那里它在作为行动最大化的目标时发生。我们建议,这些联系源于ML系统中的相关性计算方式。这最终可能揭示出ML系统中的特定行动趋势。
https://arxiv.org/abs/2405.04386
Structured finance, which involves restructuring diverse assets into securities like MBS, ABS, and CDOs, enhances capital market efficiency but presents significant due diligence challenges. This study explores the integration of artificial intelligence (AI) with traditional asset review processes to improve efficiency and accuracy in structured finance. Using both open-sourced and close-sourced large language models (LLMs), we demonstrate that AI can automate the verification of information between loan applications and bank statements effectively. While close-sourced models such as GPT-4 show superior performance, open-sourced models like LLAMA3 offer a cost-effective alternative. Dual-agent systems further increase accuracy, though this comes with higher operational costs. This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.
结构化金融是将多样化的资产重新结构为像MBS、ABS和CDOs这样的证券,从而增强资本市场效率,但同时也带来了重大的尽职调查挑战。本研究探讨了将人工智能(AI)与传统资产审查过程相结合以提高效率和准确性的结构化金融。我们使用开源和闭源的大语言模型(LLMs)证明了AI可以有效自动化贷款申请和银行报表之间的信息验证。虽然像GPT-4这样的闭源模型表现出更好的性能,但像LLAMA3这样的开源模型则具有更低的成本效益。双代理系统进一步增加了准确性,但这也带来了更高的运营成本。这项研究突出了AI在减少手动错误和简化尽职调查方面的潜力,建议将AI在金融文件分析和风险管理中得到更广泛的运用。
https://arxiv.org/abs/2405.04294
Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how the scientific community can increasingly leverage ML techniques to achieve scientific discoveries. We observe that the applicability and opportunity of ML depends strongly on the nature of the problem domain, and whether we have full ({\it e.g.}, turbulence), partial ({\it e.g.}, computational biochemistry), or no ({\it e.g.}, neuroscience) {\it a-priori} knowledge about the governing equations and physical properties of the system. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries. Throughout these diverse fields, there is a theme that ML is enabling researchers to embrace complexity in observational data that was previously intractable to classic analysis and numerical investigations.
技术的进步大幅提高了计算能力和数据可用性,使机器学习(ML)技术在各种领域得到广泛应用。然而,我们利用 ML 方法进行科学发现的能力,即获得关于自然过程的基本和形式化知识的途径,仍然处于起步阶段。在本文中,我们探讨了科学界如何越来越多地利用 ML 技术实现科学发现。我们观察到,ML 的适用性和机会取决于问题领域的性质,以及我们是否拥有关于系统统治方程和物理性质的完整、部分或无先验知识。尽管仍然面临挑战,但遵循 ML 的原则性应用正在开辟基本科学发现的新的途径。在这些多样领域中,一个主题是 ML 正在使研究人员能够拥抱以前难以进行经典分析和数值调查的观测数据中的复杂性。
https://arxiv.org/abs/2405.04161
Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the research community. In this paper, we present a systematic review of acceleration algorithms in GNNs, which can be categorized into three main topics based on their purpose: training acceleration, inference acceleration, and execution acceleration. Specifically, we summarize and categorize the existing approaches for each main topic, and provide detailed characterizations of the approaches within each category. Additionally, we review several libraries related to acceleration algorithms in GNNs and discuss our Scalable Graph Learning (SGL) library. Finally, we propose promising directions for future research. A complete summary is presented in our GitHub repository: this https URL.
图神经网络(GNNs)已经在各种基于图的任务中展示了有效性。然而,它们在训练和推理方面的低效性给大规模图形应用的扩展带来了挑战。为解决这些关键挑战,提出了许多算法以加速GNN的训练和推理,吸引了越来越多的研究人员的关注。在本文中,我们对GNN加速算法进行了系统综述,可以根据其目的分为三个主要主题:训练加速、推理加速和执行加速。具体来说,我们总结了和分类现有针对每个主题的现有方法,并提供了每个主题内方法的详细描述。此外,我们还审查了几个与GNN加速算法相关的库,并讨论了我们可扩展的图形学习(SGL)库。最后,我们提出了未来研究的建议。完整总结在我们GitHub仓库中:此链接。
https://arxiv.org/abs/2405.04114
Crafting effective prompts for code generation or editing with Large Language Models (LLMs) is not an easy task. Particularly, the absence of immediate, stable feedback during prompt crafting hinders effective interaction, as users are left to mentally imagine possible outcomes until the code is generated. In response, we introduce Language-Oriented Code Sketching, an interactive approach that provides instant, incremental feedback in the form of code sketches (i.e., incomplete code outlines) during prompt crafting. This approach converts a prompt into a code sketch by leveraging the inherent linguistic structures within the prompt and applying classic natural language processing techniques. The sketch then serves as an intermediate placeholder that not only previews the intended code structure but also guides the LLM towards the desired code, thereby enhancing human-LLM interaction. We conclude by discussing the approach's applicability and future plans.
编写有效的提示对于使用大型语言模型(LLMs)进行代码生成或编辑并不是一项容易的任务。特别地,在提示制作过程中缺乏即时的、稳定的反馈会阻碍有效的交互,用户需要等到代码生成后才能在头脑中想象可能的结果。为了应对这个问题,我们引入了面向语言的代码草图,一种交互式的提示制作方法,在制作提示的过程中提供即时、逐步的反馈,以代码草图(即不完整的代码大纲)的形式呈现。这种方法通过利用提示中固有的语言结构并应用经典的自然语言处理技术将提示转换为代码草图。草图 then serves as an intermediate placeholder that not only previews the intended code structure but also guides the LLM towards the desired code, thereby enhancing human-LLM interaction. We conclude by discussing the applicability of this approach and our future plans.
https://arxiv.org/abs/2405.03998
Previous studies have demonstrated that proactive interaction with user reviews has a positive impact on the perception of app users and encourages them to submit revised ratings. Nevertheless, developers encounter challenges in managing a high volume of reviews, particularly in the case of popular apps with a substantial influx of daily reviews. Consequently, there is a demand for automated solutions aimed at streamlining the process of responding to user reviews. To address this, we have developed a new system for generating automatic responses by leveraging user-contributed documents with the help of retrieval-augmented generation (RAG) and advanced Large Language Models (LLMs). Our solution, named SCRABLE, represents an adaptive customer review response automation that enhances itself with self-optimizing prompts and a judging mechanism based on LLMs. Additionally, we introduce an automatic scoring mechanism that mimics the role of a human evaluator to assess the quality of responses generated in customer review domains. Extensive experiments and analyses conducted on real-world datasets reveal that our method is effective in producing high-quality responses, yielding improvement of more than 8.5% compared to the baseline. Further validation through manual examination of the generated responses underscores the efficacy our proposed system.
以前的研究表明,积极地与用户评论互动会对应用程序用户的感知产生积极影响,并鼓励他们提交修改后的评分。然而,开发人员在处理大量评论方面遇到了挑战,特别是在拥有大量每日评论的流行应用程序中。因此,人们需要针对应对用户评论的流程进行自动化解决方案的需求。为了应对这个问题,我们开发了一种通过利用用户贡献文档进行自动响应的新系统。我们在这个系统中利用了检索增强生成(RAG)和先进的大语言模型(LLM)来生成自动响应。我们的解决方案SCRABLE代表了一种自适应的客户评论响应自动化,它通过自优化提示和基于LLM的判断机制来增强自己。此外,我们还引入了一个自动评分机制,模仿了人类评估者的角色来评估客户评论领域中生成出的响应的质量。通过在现实世界数据集上进行的大量实验和分析,我们的方法在产生高质量响应方面非常有效,与基线相比改善了8.5%以上。通过手动检查生成的响应进一步验证了我们所提出的系统的有效性。
https://arxiv.org/abs/2405.03845
The rapid advancements in large language models (LLMs) have opened new avenues across various fields, including cybersecurity, which faces an ever-evolving threat landscape and need for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a comprehensive overview of this research area. This paper bridge this gap by providing a systematic literature review, encompassing an analysis of over 180 works, spanning across 25 LLMs and more than 10 downstream scenarios. Our comprehensive overview addresses three critical research questions: the construction of cybersecurity-oriented LLMs, LLMs' applications in various cybersecurity tasks, and the existing challenges and further research in this area. This study aims to shed light on the extensive potential of LLMs in enhancing cybersecurity practices, and serve as a valuable resource for applying LLMs in this doamin. We also maintain and regularly updated list of practical guides on LLMs for cybersecurity at this https URL.
大语言模型(LLMs)的快速进步已经为各个领域打开了许多新的途径,包括网络安全,这是一个不断变化的安全威胁格局和创新技术需求的领域。尽管在应用LLMs在网络安全方面进行了初步探索,但在这个研究领域还缺乏全面的概述。本文通过提供一篇系统性的文献综述,涵盖了25个LLM及更多下游场景的分析,弥合了这一空白。我们全面的概述回答了三个关键的研究问题:构建安全导向的LLM,LLM在各种网络安全任务中的应用,以及该领域当前的挑战和进一步的研究。本研究旨在阐明LLMs在增强网络安全实践方面的广泛潜力,并成为应用LLMs的有力支持。我们还维护并定期更新一个LLM网络安全指南的列表,您可以在此链接查看最新版本。
https://arxiv.org/abs/2405.03644
Miniaturization of cameras and LiDAR sensors has enabled the development of wearable 3D mapping systems for emergency responders. These systems have the potential to revolutionize response capabilities by providing real-time, high-fidelity maps of dynamic and hazardous environments. We present our recent efforts towards the development of such ultra-portable 3D mapping systems. We review four different sensor configurations, either helmet-mounted or body-worn, with two different mapping algorithms that were implemented and evaluated during field trials. The paper discusses the experimental results with the aim to stimulate further discussion within the portable 3D mapping research community.
相机和激光雷达的微型化使得开发了用于急救员的可穿戴式3D地图系统成为可能。这些系统有可能通过提供实时、高精度的动态和危险环境的高清地图,彻底颠覆救援能力。我们将介绍我们最近在开发此类超便携式3D地图系统方面的努力。我们回顾了四种不同传感器配置(头戴式或身体穿戴式),以及两种在实地试验中实施和评估的不同映射算法。本文讨论了实验结果,旨在激发便携式3D地图研究社区内的进一步讨论。
https://arxiv.org/abs/2405.03514
Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting.
基于图像的三维重建是一个具有挑战性的任务,涉及从一系列输入图像中推断出物体或场景的3D形状。基于学习的方法因直接估计3D形状而受到关注。本综述论文重点介绍最先进的三维重建技术,包括生成新颖且未见过的视图。概述了Gaussian Splatting方法最近的发展,包括输入类型、模型结构和输出表示,以及训练策略。还讨论了未解决的问题和未来的发展方向。鉴于该领域快速发展和许多提高3D重建方法的机会,全面评估算法似乎至关重要。因此,本研究全面概述了Gaussian Splatting的最新进展。
https://arxiv.org/abs/2405.03417
Purpose: Autonomous navigation of devices in endovascular interventions can decrease operation times, improve decision-making during surgery, and reduce operator radiation exposure while increasing access to treatment. This systematic review explores recent literature to assess the impact, challenges, and opportunities artificial intelligence (AI) has for the autonomous endovascular intervention navigation. Methods: PubMed and IEEEXplore databases were queried. Eligibility criteria included studies investigating the use of AI in enabling the autonomous navigation of catheters/guidewires in endovascular interventions. Following PRISMA, articles were assessed using QUADAS-2. PROSPERO: CRD42023392259. Results: Among 462 studies, fourteen met inclusion criteria. Reinforcement learning (9/14, 64%) and learning from demonstration (7/14, 50%) were used as data-driven models for autonomous navigation. Studies predominantly utilised physical phantoms (10/14, 71%) and in silico (4/14, 29%) models. Experiments within or around the blood vessels of the heart were reported by the majority of studies (10/14, 71%), while simple non-anatomical vessel platforms were used in three studies (3/14, 21%), and the porcine liver venous system in one study. We observed that risk of bias and poor generalisability were present across studies. No procedures were performed on patients in any of the studies reviewed. Studies lacked patient selection criteria, reference standards, and reproducibility, resulting in low clinical evidence levels. Conclusions: AI's potential in autonomous endovascular navigation is promising, but in an experimental proof-of-concept stage, with a technology readiness level of 3. We highlight that reference standards with well-identified performance metrics are crucial to allow for comparisons of data-driven algorithms proposed in the years to come.
目的:在介入治疗中,自主导航设备的操作时间可以缩短,手术过程中的决策可以得到改善,同时辐射剂量可以降低,同时提高治疗的可获取性。本系统综述评估了近年来关于人工智能(AI)在自主导航穿刺器/引导线在介入治疗中的影响、挑战和机会的文献,以评估AI在自主导航穿刺器/引导线在介入治疗中的潜在影响。方法:PubMed和IEEE Explore数据库进行查询。符合资格标准的研究包括研究使用AI促进自主导航穿刺器/引导线在介入治疗中的应用。然后使用PRISMA和QUADAS-2对文章进行评估。PROSPERO: CRD42023392259。结果:在462篇论文中,有14篇符合资格标准。强化学习(9/14,64%)和学习演示(7/14,50%)被用作数据驱动模型进行自主导航。研究主要使用物理幻象(10/14,71%)和仿真(4/14,29%)模型。大多数研究(10/14,71%)报道了心脏血管内实验,而三篇研究(3/14,21%)使用了简单非解剖性血管平台,一篇研究(1/14,7)使用了猪肝静脉系统。我们观察到,研究中的偏见和普遍性存在。在所有审查的研究中,没有对患者进行任何操作。研究缺乏患者选择标准、参考标准和可重复性,导致临床证据水平较低。结论:AI在自主导航介入治疗中的潜在影响是积极的,但目前仍处于实验验证阶段,技术成熟度为3。我们强调,具有明确定义的性能指标的参考标准对于允许未来数据驱动算法的比较至关重要。
https://arxiv.org/abs/2405.03305
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.
在过去的30年里,引导响应功率(SRP)方法一直是声音源定位(SSL)任务中广泛使用的工具,因为它在中等喧闹和噪音场景中的定位性能相当出色。许多研究分析了并扩展了原始SRP方法以降低计算成本,允许其在恶劣环境中提高性能,或者为了其他目的。在这篇工作中,我们回顾了超过200篇有关SRP及其变体的论文,重点关注SRP-PHAT方法。我们还提出了扩展-SRP或X-SRP,是一个通用和模块化的SRP算法的扩展版本,允许本文中回顾的扩展实现。我们提供了算法的Python实现,其中包括来自文献的 selected extensions。
https://arxiv.org/abs/2405.02991
In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
在过去,研究单一低维激活函数在网络中的作用导致了内部协变量偏移和梯度偏差问题。一个相对较小的研究领域是如何使用函数组合来提供单个激活函数应用的属性完成。我们提出了一个网络对抗方法来解决上述挑战。这是第一个在网络中使用不同激活函数的方法。根据当前网络中的激活函数,构建了一个具有相反导数图像属性的对抗函数,并将其交替用于不同网络层的激活函数。对于复杂情况,我们提出了高维函数图分解(HD-FGD)方法,将其划分为不同的部分并传递给线性层。然后通过整合每个分解项的逆导数,我们得到了其对抗函数,通过分解过程的计算规则进行参考。网络对抗方法和HD-FGD单独使用可以有效替代传统的MLP+激活函数模式。通过上述方法,我们在关于训练效率和预测准确性的标准激活函数方面取得了显著的改进。本文讨论了几个普遍激活函数的对抗问题,提出了可以轻松集成到现有模型中而不会产生任何不利影响的可替代方案。在会议审稿过程中完成代码发布为开源。
https://arxiv.org/abs/2405.03712
In recent years, autonomous driving has garnered escalating attention for its potential to relieve drivers' burdens and improve driving safety. Vision-based 3D occupancy prediction, which predicts the spatial occupancy status and semantics of 3D voxel grids around the autonomous vehicle from image inputs, is an emerging perception task suitable for cost-effective perception system of autonomous driving. Although numerous studies have demonstrated the greater advantages of 3D occupancy prediction over object-centric perception tasks, there is still a lack of a dedicated review focusing on this rapidly developing field. In this paper, we first introduce the background of vision-based 3D occupancy prediction and discuss the challenges in this task. Secondly, we conduct a comprehensive survey of the progress in vision-based 3D occupancy prediction from three aspects: feature enhancement, deployment friendliness and label efficiency, and provide an in-depth analysis of the potentials and challenges of each category of methods. Finally, we present a summary of prevailing research trends and propose some inspiring future outlooks. To provide a valuable reference for researchers, a regularly updated collection of related papers, datasets, and codes is organized at this https URL.
近年来,自动驾驶因为其减轻驾驶员负担和提高驾驶安全性的潜在优势而备受关注。基于视觉的3D占用预测,预测自动驾驶车辆周围3D体素网格的空间占用状态和语义,是一个适合自动驾驶低成本感知系统的 emerging perception 任务。尽管大量研究表明,与物体中心感知任务相比,3D占用预测具有更大的优势,但目前仍缺乏针对这一快速发展的领域的专门 review。在本文中,我们首先介绍了基于视觉的3D占用预测的背景,并讨论了这项任务的挑战。然后,我们从三个方面对基于视觉的3D占用预测的研究进展进行全面调查:特征增强、部署友好性和标签效率,并深入分析每种方法的潜在和挑战。最后,我们总结了当前研究趋势,并提出了鼓舞人心的未来展望。为了为研究人员提供有价值的参考,在 https://www. this URL 处组织了一个定期更新的相关论文、数据和代码的集合。
https://arxiv.org/abs/2405.02595
Nowadays, large-scale foundation models are being increasingly integrated into numerous safety-critical applications, including human-autonomy teaming (HAT) within transportation, medical, and defence domains. Consequently, the inherent 'black-box' nature of these sophisticated deep neural networks heightens the significance of fostering mutual understanding and trust between humans and autonomous systems. To tackle the transparency challenges in HAT, this paper conducts a thoughtful study on the underexplored domain of Explainable Interface (EI) in HAT systems from a human-centric perspective, thereby enriching the existing body of research in Explainable Artificial Intelligence (XAI). We explore the design, development, and evaluation of EI within XAI-enhanced HAT systems. To do so, we first clarify the distinctions between these concepts: EI, explanations and model explainability, aiming to provide researchers and practitioners with a structured understanding. Second, we contribute to a novel framework for EI, addressing the unique challenges in HAT. Last, our summarized evaluation framework for ongoing EI offers a holistic perspective, encompassing model performance, human-centered factors, and group task objectives. Based on extensive surveys across XAI, HAT, psychology, and Human-Computer Interaction (HCI), this review offers multiple novel insights into incorporating XAI into HAT systems and outlines future directions.
如今,大型基础模型正日益集成到许多关键应用中,包括运输、医疗和军事领域。因此,这些复杂深度神经网络固有的“黑盒子”性质加剧了在人类和自主系统之间促进相互理解和信任的重要性。为了应对HAT中透明度挑战,本文从人类中心的角度对HAT系统中的可解释接口(EI)进行了一项深入的研究,从而为现有的可解释人工智能(XAI)研究提供了丰富的内容。我们研究了在XAI增强的HAT系统中的EI的设计、开发和评估。为此,我们首先明确了这些概念之间的区别:EI、解释和模型可解释性,旨在为研究人员和实践者提供结构化的理解。其次,我们为EI提供了一个新的框架,解决了HAT中独特的挑战。最后,我们针对正在进行中的EI的总结评估框架提供了一个整体视角,包括模型性能、人类因素和团队任务目标。基于对XAI、HAT、心理学和人机交互(HCI)领域的广泛调查,本文综述为将XAI融入HAT系统提供了多个新的见解,并为未来的研究提供了方向。
https://arxiv.org/abs/2405.02583
As generative artificial intelligence (AI), particularly Large Language Models (LLMs), continues to permeate healthcare, it remains crucial to supplement traditional automated evaluations with human expert evaluation. Understanding and evaluating the generated texts is vital for ensuring safety, reliability, and effectiveness. However, the cumbersome, time-consuming, and non-standardized nature of human evaluation presents significant obstacles to the widespread adoption of LLMs in practice. This study reviews existing literature on human evaluation methodologies for LLMs within healthcare. We highlight a notable need for a standardized and consistent human evaluation approach. Our extensive literature search, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, spans publications from January 2018 to February 2024. This review provides a comprehensive overview of the human evaluation approaches used in diverse healthcare applications.This analysis examines the human evaluation of LLMs across various medical specialties, addressing factors such as evaluation dimensions, sample types, and sizes, the selection and recruitment of evaluators, frameworks and metrics, the evaluation process, and statistical analysis of the results. Drawing from diverse evaluation strategies highlighted in these studies, we propose a comprehensive and practical framework for human evaluation of generative LLMs, named QUEST: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence. This framework aims to improve the reliability, generalizability, and applicability of human evaluation of generative LLMs in different healthcare applications by defining clear evaluation dimensions and offering detailed guidelines.
作为生成人工智能(AI),特别是大型语言模型(LLMs),继续在医疗保健领域渗透,确保通过人类专家评估补充传统自动化评估至关重要。理解并评估生成的文本对确保安全性、可靠性和有效性至关重要。然而,人类评估的繁琐、耗时和非标准化性质在实际应用中造成了重大障碍,这使得LLMs在医疗保健领域的广泛采用面临挑战。 本研究回顾了LLM在医疗保健领域现有的人评估方法论。我们强调了标准化和一致的人评估方法的重要性。我们广泛搜索了从2018年1月至2024年2月期间发表的出版物,遵循PRISMA指南,对评估方法进行了深入研究。本审查提供了各种医疗保健应用中使用的人评估方法的全面概述。 本分析研究了LLM在不同医学专业领域的人评估,包括评估维度、样本类型和大小、评估者的选择和招募、评估框架和指标以及评估过程和数据分析结果。从这些研究中强调的不同评估策略中,我们提出了一个全面且实用的框架,名为QUEST:信息质量、理解、推理、表达风格和人物、安全和伤害、信任和信心。该框架旨在通过明确评估维度并为不同的医疗保健应用提供详细指导,提高人类评估LLM的可靠性、可重复性和适用性。
https://arxiv.org/abs/2405.02559
Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized healthcare delivery. However, accessing extensively labeled datasets to train data-hungry deep learning models encounters many barriers, such as long-tail distribution of rare diseases, cost of annotation, privacy and security concerns, data-sharing regulations, and ethical considerations. An emerging approach to overcome the scarcity of labeled data is to augment AI methods with human-like capabilities to leverage past experiences to learn new tasks with limited examples, called few-shot learning. This survey provides a comprehensive review and comparison of few-shot learning methods for biomedical time series applications. The clinical benefits and limitations of such methods are discussed in relation to traditional data-driven approaches. This paper aims to provide insights into the current landscape of few-shot learning for biomedical time series and its implications for future research and applications.
随着可穿戴传感器技术的发展和医疗记录的数字化,生物医学时间序列数据的普及达到了前所未有的程度。数据驱动的模型具有巨大的潜力,可以通过提高长期监测能力、促进早期疾病检测和干预、以及推动个性化医疗交付,从而协助临床诊断和改善患者护理。然而,访问大量带标签的医疗数据来训练渴望深度学习模型的过程中遇到了许多障碍,例如罕见疾病的长尾分布、注释成本、隐私和安全问题、数据共享规定和伦理问题等。克服数据稀缺的一种新兴方法是增强人工智能方法的人性化能力,利用过去的经验来学习新的任务,称为少样本学习。本调查对少样本学习方法在生物医学时间序列应用进行了全面的回顾和比较。这些方法与传统数据驱动方法相关的临床益处和局限性进行了讨论。本文旨在为生物医学时间序列的少样本学习现状提供见解,并探讨其对未来研究和应用的潜在影响。
https://arxiv.org/abs/2405.02485
Automatic citation generation for sentences in a document or report is paramount for intelligence analysts, cybersecurity, news agencies, and education personnel. In this research, we investigate whether large language models (LLMs) are capable of generating references based on two forms of sentence queries: (a) Direct Queries, LLMs are asked to provide author names of the given research article, and (b) Indirect Queries, LLMs are asked to provide the title of a mentioned article when given a sentence from a different article. To demonstrate where LLM stands in this task, we introduce a large dataset called REASONS comprising abstracts of the 12 most popular domains of scientific research on arXiv. From around 20K research articles, we make the following deductions on public and proprietary LLMs: (a) State-of-the-art, often called anthropomorphic GPT-4 and GPT-3.5, suffers from high pass percentage (PP) to minimize the hallucination rate (HR). When tested with this http URL (7B), they unexpectedly made more errors; (b) Augmenting relevant metadata lowered the PP and gave the lowest HR; (c) Advance retrieval-augmented generation (RAG) using Mistral demonstrates consistent and robust citation support on indirect queries and matched performance to GPT-3.5 and GPT-4. The HR across all domains and models decreased by an average of 41.93% and the PP was reduced to 0% in most cases. In terms of generation quality, the average F1 Score and BLEU were 68.09% and 57.51%, respectively; (d) Testing with adversarial samples showed that LLMs, including the Advance RAG Mistral, struggle to understand context, but the extent of this issue was small in Mistral and GPT-4-Preview. Our study con tributes valuable insights into the reliability of RAG for automated citation generation tasks.
为情报分析员、网络安全人员、新闻机构和教育工作者,自动引用文献中的句子至关重要。在这项研究中,我们调查了大型语言模型(LLMs)是否能够根据两种句子查询形式生成引用: (a)直接查询,LLM被要求提供给定研究文章的作者姓名;(b)间接查询,当给定一个来自不同文章的句子时,LLM被要求提供提及的文章标题。为了证明LLM在这项任务中的地位,我们引入了一个大型数据集REASONS,其包括arXiv上最热门的12个科学研究领域摘要。从大约20K篇研究论文中,我们做出了以下推断:(a)最先进的、被称为类人化的GPT-4和GPT-3.5,存在高通过率(PP)问题,以最小化幻觉率(HR)。当用这个url(7B)进行测试时,它们出人意料地犯了更多的错误;(b)增加相关元数据降低了PP,并提供了最低的HR;(c) Mistral使用 Advance Retrieval-Augmented Generation (RAG) 展示了在间接查询和GPT-3.5及GPT-4上的匹配性能和一致性支持。所有领域和模型的HR下降了平均41.93%,而PP在大多数情况下降至0%。在生成质量方面,平均的F1分数和BLEU分别为68.09%和57.51%。(d) 使用对抗样本测试表明,包括Advance RAG Mistral在内的LLM在理解上下文方面遇到困难,但Mistral和GPT-4-Preview中的这个问题程度较小。我们的研究为自动引用生成任务的可靠性提供了宝贵的见解。
https://arxiv.org/abs/2405.02228