In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
在本文中,我们提出了名为Prompt2Sign的第一个全面的跨语言手语数据集,该数据集基于包括美国手语(ASL)在内的公共数据,并对其进行了扩展。我们的数据集将广泛的视频转换为简洁、模型友好的格式,专为训练包括seq2seq和text2text等翻译模型的训练而优化。在此基础上,我们提出了SignLLM,第一个跨语言手语生成(SLP)模型,包括两种新的多语言SLP模式,允许从输入文本或提示生成手语动作。这两种模式都可以使用新的损失和基于强化学习的模块,从而加速模型的训练,提高其自主采样高质量数据的能力。我们展示了SignLLM的基准结果,证明了我们的模型在八个手语语言的SLP任务上实现了最先进的成绩。
https://arxiv.org/abs/2405.10718
Safe navigation in unknown environments stands as a significant challenge in the field of robotics. Control Barrier Function (CBF) is a strong mathematical tool to guarantee safety requirements. However, a common assumption in many works is that the CBF is already known and obstacles have predefined shapes. In this letter, we present a novel method called Occupancy Grid Map-based Control Barrier Function (OGM-CBF), which defines Control Barrier Function based on Occupancy Grid Maps. This enables generalization to unknown environments while generating online local or global maps of the environment using onboard perception sensors such as LiDAR or camera. With this method, the system guarantees safety via a single, continuously differentiable CBF per time step, which can be represented as one constraint in the CBF-QP optimization formulation while having an arbitrary number of obstacles with unknown shapes in the environment. This enables practical real-time implementation of CBF in both unknown and known environments. The efficacy of OGM-CBF is demonstrated in the safe control of an autonomous car in the CARLA simulator and a real-world industrial mobile robot.
在机器人领域,安全导航是一个重要的挑战。控制障碍功能(CBF)是一种强大的数学工具,用于确保安全要求。然而,许多作品中的常见假设是CBF已经已知,障碍具有预定义的形状。在本文中,我们提出了一种名为基于占用网格映射的控制障碍功能(OGM-CBF)的新方法,该方法基于占用网格映射定义控制障碍功能。这使得可以在未知的环境中进行泛化,并通过车载感知传感器(如激光雷达或摄像头)生成环境中的在线局部或全局地图。通过这种方法,系统通过每个时间步的单一致可导的控制障碍函数保证安全性,该函数可以表示为CBF-QP优化形式中的一个约束,同时环境中具有任意形状的障碍物数量是未知的。这使得可以在未知和已知环境中实现CBF的实际实时实现。OGM-CBF的有效性在CARLA模拟器和现实工业移动机器人中对自动驾驶汽车的安全控制中得到了演示。
https://arxiv.org/abs/2405.10703
The escalating volumes of textile waste globally necessitate innovative waste management solutions to mitigate the environmental impact and promote sustainability in the fashion industry. This paper addresses the inefficiencies of traditional textile sorting methods by introducing an autonomous textile analysis pipeline. Utilising robotics, spectral imaging, and AI-driven classification, our system enhances the accuracy, efficiency, and scalability of textile sorting processes, contributing to a more sustainable and circular approach to waste management. The integration of a Digital Twin system further allows critical evaluation of technical and economic feasibility, providing valuable insights into the sorting system's accuracy and reliability. The proposed framework, inspired by Industry 4.0 principles, comprises five interconnected layers facilitating seamless data exchange and coordination within the system. Preliminary results highlight the potential of our holistic approach to mitigate environmental impact and foster a positive shift towards recycling in the textile industry.
全球纺织品垃圾量的不断增加迫使我们需要创新垃圾管理解决方案,减轻对环境的影响,推动可持续时尚行业的可持续发展。本文通过引入自主纺织品分析流水线解决了传统纺织品分类方法的不效率。利用机器人、光谱成像和AI驱动分类,我们的系统提高了纺织品分类过程的准确性、效率和可扩展性,有助于实现更可持续和循环的废物管理方法。引入数字孪生系统进一步允许对技术和经济可行性进行关键评估,为废物管理系统提供了宝贵的见解,并提高了系统的准确性。我们提出的框架,受到 Industry 4.0原则的启发,由五个相互连接的层组成,促进了系统内无缝数据交流和协调。初步结果强调了我们整体方法减少环境影响和促进纺织业回收的潜力。
https://arxiv.org/abs/2405.10696
3D occupancy perception holds a pivotal role in recent vision-centric autonomous driving systems by converting surround-view images into integrated geometric and semantic representations within dense 3D grids. Nevertheless, current models still encounter two main challenges: modeling depth accurately in the 2D-3D view transformation stage, and overcoming the lack of generalizability issues due to sparse LiDAR supervision. To address these issues, this paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach is three-fold: 1) Integration of explicit lift-based depth prediction and implicit projection-based transformers for depth modeling, enhancing the density and robustness of view transformation. 2) Utilization of mask-based encoder-decoder architecture for fine-grained semantic predictions; 3) Adoption of context-aware self-training loss functions in the pertaining stage to complement LiDAR supervision, involving the re-rendering of 2D depth maps from 3D occupancy features and leveraging image reconstruction loss to obtain denser depth supervision besides sparse LiDAR ground-truths. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone compared with current models, marking an improvement of 3.3% due to our proposed contributions. Comprehensive experimentation also demonstrates the consistent superiority of our method over baselines and alternative approaches.
3D占有率感知在最近以视觉为中心的自驾系统中将周围视图图像转换为密集3D网格内的集成几何和语义表示,在很大程度上推动了这种技术的发展。然而,目前的模型仍然面临着两个主要挑战:在2D-3D视图变换阶段准确建模深度,以及克服由于稀疏LiDAR监督而导致的泛化问题。为了应对这些问题,本文提出了GEOcc,一种专为视觉仅周围视图感知而设计的几何增强占有率网络。我们的方法是三方面的:1)将显式升力为基础的深度预测和隐式投影为基础的变压器深度建模相结合,提高视图变换的密度和稳健性;2)利用掩码为基础的编码器-解码器架构进行细粒度语义预测;3)在相关阶段采用语境感知自训练损失函数来补充LiDAR监督,包括从3D占有率特征重新渲染2D深度图,并利用图像重建损失以获得比稀疏LiDAR ground-truths更密的深度监督。我们的方法在Occ3D-nuScenes数据集上实现了与最少的图像分辨率相关的最轻量级的图像骨架,与当前模型的最轻量级图像骨架相比,提高了3.3%的性能,并通过我们的建议取得了显著的改善。综合实验还证明了我们的方法相对于基线和替代方法的优势是一致的。
https://arxiv.org/abs/2405.10591
In this report, we describe the technical details of our submission to the 2024 RoboDrive Challenge Robust Map Segmentation Track. The Robust Map Segmentation track focuses on the segmentation of complex driving scene elements in BEV maps under varied driving conditions. Semantic map segmentation provides abundant and precise static environmental information crucial for autonomous driving systems' planning and navigation. While current methods excel in ideal circumstances, e.g., clear daytime conditions and fully functional sensors, their resilience to real-world challenges like adverse weather and sensor failures remains unclear, raising concerns about system safety. In this paper, we explored several methods to improve the robustness of the map segmentation task. The details are as follows: 1) Robustness analysis of utilizing temporal information; 2) Robustness analysis of utilizing different backbones; and 3) Data Augmentation to boost corruption robustness. Based on the evaluation results, we draw several important findings including 1) The temporal fusion module is effective in improving the robustness of the map segmentation model; 2) A strong backbone is effective for improving the corruption robustness; and 3) Some data augmentation methods are effective in improving the robustness of map segmentation models. These novel findings allowed us to achieve promising results in the 2024 RoboDrive Challenge-Robust Map Segmentation Track.
在本报告中,我们描述了我们向2024 RoboDrive Challenge Robust Map Segmentation Track提交的详细技术细节。Robust Map Segmentation Track关注在各种驾驶条件下对BEV地图复杂驾驶场景元素进行分割。语义图分割提供了丰富的精确静态环境信息,这对自动驾驶系统的规划和导航至关重要。尽管现有方法在理想情况下表现出色,例如清晰的白昼条件和功能完备的传感器,但它们对现实世界挑战(如恶劣天气和传感器故障)的鲁棒性仍然不明确,这引发了对系统安全性的担忧。在本文中,我们探讨了多种方法以提高地图分割任务的鲁棒性。具体细节如下:1)利用时间信息的鲁棒性分析;2)利用不同骨骼网络的鲁棒性分析;3)数据增强来提高故障鲁棒性。根据评估结果,我们得出几个重要结论,包括1)时间融合模块在提高地图分割模型的鲁棒性方面非常有效;2)强大的骨骼网络在提高故障鲁棒性方面非常有效;3)一些数据增强方法在提高地图分割模型的鲁棒性方面有效。这些新发现使我们能够在2024 RoboDrive Challenge-Robust Map Segmentation Track上实现令人鼓舞的结果。
https://arxiv.org/abs/2405.10567
In computer vision and graphics, the accurate reconstruction of road surfaces is pivotal for various applications, especially in autonomous driving. This paper introduces a novel method leveraging the Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces in height, color, and semantic information by input world coordinates x and y. Our approach NeRO uses encoding techniques based on MLPs, significantly improving the performance of the complex details, speeding up the training speed, and reducing neural network size. The effectiveness of this method is demonstrated through its superior performance, which indicates a promising direction for rendering road surfaces with semantics applications, particularly in applications demanding visualization of road conditions, 4D labeling, and semantic groupings.
在计算机视觉和图形学中,准确地重构道路表面对于各种应用至关重要,尤其是在自动驾驶中。本文介绍了一种利用多层感知器(MLPs)框架在输入世界坐标x和y的基础上重构道路表面的新颖方法。我们的方法NeRO基于MLPs的编码技术,显著提高了复杂细节的性能,加快了训练速度,并减小了神经网络的大小。该方法的效果通过其卓越的性能得到证明,这表明了用语义应用渲染道路表面的一个有前景的方向,特别是在需要展示道路状况、4D标注和语义组分的应用中。
https://arxiv.org/abs/2405.10554
To support the testing of AVs, CETRAN has created a guideline for the evaluation of complex multi agent test scenarios presented in this report. This allows for a clear structured manner in evaluating complexity elements based on the corresponding difficulties an AV might encounter in Singapore traffic. This study aims to understand the source of complexity for AVs from traffic hazard, by breaking down the difficulties on AV capabilities as perception, situation awareness and decision-making. Guidelines created through this study are composed by a list of elements to be considered in the future as selection criteria to evaluate complexity of scenarios to support AV behaviour assessment. This study is intended to be a guide to understand the sources of complexity for Avs and can be used to challenge the risk management ability of autonomous vehicles in a scenario-based test approach or traffic situations faced on road trials. The report includes the usage of the guidelines created as application to evaluate the complexity of a set of 5 real events that occur on Singapore roads from Resembler webtool which is a database of real human accidents/incidents. Four scenarios were also designed for creation in simulation by the CETRAN team, applying the guidelines for complexity elements created in this work, to illustrate the difficulties an ADS could experience with such scenarios.
为支持AV的测试,CETRAN制定了一份关于评估报告中所述复杂多代理测试场景的指南。这使得可以根据AV在新加坡道路上可能遇到的相关困难对复杂性元素进行清晰的结构化评估。这项研究旨在了解AV从交通风险中复杂性的来源,通过分解AV能力感知、情境意识和决策方面的困难。通过这项研究创建的指南是由评估场景复杂性的选定标准列表组成的。本研究旨在成为了解AV复杂性的指南,并可用于挑战基于场景的测试方法中自动驾驶车辆的风险管理能力,或者在道路试验中面临的交通情况。该报告包括使用Resembler网页工具对5个新加坡道路上发生的事件进行评估,以评估复杂性。此外,CETRAN团队还设计了4个场景,应用本研究中创建的复杂性元素,以说明ADS在这种情况下可能遇到的困难。
https://arxiv.org/abs/2405.10526
This research analyzes, models and develops a novel Digital Learning Environment (DLE) fortified by the innovative Private Learning Intelligence (PLI) framework. The proposed PLI framework leverages federated machine learning (FL) techniques to autonomously construct and continuously refine personalized learning models for individual learners, ensuring robust privacy protection. Our approach is pivotal in advancing DLE capabilities, empowering learners to actively participate in personalized real-time learning experiences. The integration of PLI within a DLE also streamlines instructional design and development demands for personalized teaching/learning. We seek ways to establish a foundation for the seamless integration of FL into learning systems, offering a transformative approach to personalized learning in digital environments. Our implementation details and code are made public.
这项研究分析、建模并开发了一个由创新性的私人学习智能(PLI)框架加强的数字学习环境(DLE)。所提出的PLI框架利用分布式机器学习(FL)技术来自动构建并持续改进个性化的学习模型,确保了 robust 的隐私保护。我们的方法在推动DLE功能提升,使学习者积极参与个性化实时学习体验方面具有关键性。将PLI融入DLE也简化了个性教学/学习的需求,对数字环境中的个性化学习具有颠覆性的方法。我们的实现细节和代码都已公开。
https://arxiv.org/abs/2405.10476
Over the last decade, a wide range of training and deployment strategies for Large Language Models (LLMs) have emerged. Among these, the prompting paradigms of Auto-regressive LLMs (AR-LLMs) have catalyzed a significant surge in Artificial Intelligence (AI). This paper aims to emphasize the significance of utilizing free-form modalities (forms of input and output) and verbal free-form contexts as user-directed channels (methods for transforming modalities) for downstream deployment. Specifically, we analyze the structure of modalities within both two types of LLMs and six task-specific channels during deployment. From the perspective of users, our analysis introduces and applies the analytical metrics of task customizability, transparency, and complexity to gauge their usability, highlighting the superior nature of AR-LLMs' prompting paradigms. Moreover, we examine the stimulation of diverse cognitive behaviors in LLMs through the adoption of free-form text and verbal contexts, mirroring human linguistic expressions of such behaviors. We then detail four common cognitive behaviors to underscore how AR-LLMs' prompting successfully imitate human-like behaviors using this free-form modality and channel. Lastly, the potential for improving LLM deployment, both as autonomous agents and within multi-agent systems, is identified via cognitive behavior concepts and principles.
在过去的十年里,已经出现了各种各样的训练和部署策略来对大型语言模型(LLMs)进行优化。在这些策略中,自回归语言模型的提示范式(AR-LLMs)已经显著推动了人工智能(AI)的发展。本文旨在强调利用自由形式模态(输入和输出形式)和口头自由形式上下文作为用户指导通道(转换模态的方法)进行下游部署的重要性。具体来说,我们分析了两种LLM类型的模态在部署过程中的结构,以及六个任务特定的通道。从用户的角度来看,我们的分析引入并应用了任务可定制性、透明度和复杂性的分析指标,以衡量其可用性,并强调了AR-LLM提示范式的优越性。此外,我们研究了LLMs通过采用自由形式文本和口头上下文来刺激不同认知行为的情况,类似于人类这种行为的语言表达。然后,我们详细描述了四种常见的认知行为,以强调AR-LLM提示范式如何使用这种自由形式模态和通道成功地模拟了人类类似行为。最后,通过认知行为概念和原则,提出了改善LLM部署(作为自主代理和多智能体系统中的代理)的潜力。
https://arxiv.org/abs/2405.10474
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 16 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
基础模型启发的生成人工智能促进了代理的开发和实施,这些代理可以利用卓越的推理和语言处理能力,以主动、自主地追求用户的目标。然而,在指导实践者设计代理以考虑追求目标(包括生成工具目标和计划)时,缺乏系统化的知识。例如,基础模型中存在的幻觉,推理过程的解释性,复杂责任等挑战。为了解决这个问题,我们进行了系统的文献回顾,以了解基于基础模型的代理的状态和整个生态系统。在本文中,我们提出了一个由16个架构模式组成的模式目录,这些分析基于前文献综述的结果。所提出的目录可以为有效使用模式提供全面指导,并通过促进目标寻求和计划生成来支持基于基础模型的代理架构设计。
https://arxiv.org/abs/2405.10467
The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google Gemini-Pro and OpenAI GPT-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies, enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall exceeding 93% with an error rate of approximately 10%, highlighting the effectiveness and versatility of the toolkit. We apply PropertyExtractor to generate a database of 2D material thicknesses, a critical parameter for device integration. The rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of diverse material property databases, advancing the field.
自然语言处理和大型语言模型的出现已经彻底颠覆了从非结构化学术论文中提取数据的方式。然而,确保数据可靠性仍然是一个重要的挑战。在这篇论文中,我们介绍了PropertyExtractor,一个开源工具,它利用先进的会话LLM(如Google Gemini-Pro和OpenAI GPT-4)融合了零样本和少样本的上下文学习,并使用工程化提示对结构化信息的动态细化进行处理,从而实现自主、高效、可扩展和准确的物质属性数据提取、提取和验证。我们对材料数据的研究表明,精度误差率不到10%,证明了工具包的有效性和多样性。我们将PropertyExtractor应用于生成2D材料厚度的数据库,这是器件集成中至关重要的一项参数。该领域的快速演变已经超出了实验测量和计算方法的范畴,造成了显著的数据缺口。我们的工作填补了这一缺口,并展示了PropertyExtractor作为可靠且高效的自动生成多样化物质属性数据库的工具的潜力,推动了该领域的发展。
https://arxiv.org/abs/2405.10448
The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
由于科学研究的限制,对冰下环境的探索面临着独特的挑战。本报告调查了在冰层下部署全 actuated Remotely Operated Vehicle(ROV)进行浅水区域探索的可能性。利用海洋机器人技术的发展,ROV 提供了将人类 presence扩展到遥远水下地点的有前景的解决方案。为了实现成功的冰下探索,ROV 必须遵循有效的局部定位信号接收轨迹。本研究开发了一个多输入多输出(MIMO)非线性系统控制器,包括基于 Lipschitz 稳定性保证和自适应律来减轻未知环境干扰的功能。模糊逻辑被用于动态调整自适应速率,提高高度非线性的 ROV 动态系统的性能。此外,粒子群优化(PSO)算法自动调整控制器参数以实现最优轨迹跟踪。报告详细介绍了 ROV 动态模型、所提出的控制框架和 PSO 基于调整过程。基于模拟的实验验证了该方法的有效性,实验结果表明,与基线控制器相比,轨迹跟踪性能具有卓越的优势。这项工作为冰下探索能力的提升做出了贡献,并为未来海洋机器人学和自主水下系统研究奠定了基础。
https://arxiv.org/abs/2405.10441
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at this https URL
在模拟中学习和将学到的策略应用于现实世界具有实现通用机器人的潜力。这种方法的关键挑战是解决模拟与现实之间的差距(sim-to-real gaps)。之前的方法通常需要先验的知识领域特定知识。我们认为,获得这种知识的最直接方法是让人类在现实生活中观察和辅助机器人策略执行。机器人可以从人类那里学习以填补各种模拟与现实之间的差距。我们提出了TRANSIC,一种基于人类在环框架的数据驱动方法,以实现基于人类在环的模拟与现实之间的成功转移。TRANSIC允许人类通过干预和在线纠错来通过各种未建模的模拟与现实之间的差距来扩展模拟策略。残余策略可以从人类的纠正中学习,并将其与模拟策略集成以实现自主执行。我们证明了,在我们的方法下,可以实现成功的模拟与现实之间的转移,特别是在复杂的接触操作任务中,如家具组装。通过模拟策略和学习人类策略的协同作用,TRANSIC是一种有效的全面方法来解决各种经常存在的模拟与现实之间的差距。它具有可扩展 human effort 的特点。视频和代码可以通过这个链接https://www.youtube.com/watch?v=获取:
https://arxiv.org/abs/2405.10315
Active reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field technology, online active high-fidelity reconstruction has become achievable. In this paper, we propose GS-Planner, a planning framework for active high-fidelity reconstruction using 3D Gaussian Splatting. With improvement on 3DGS to recognize unobserved regions, we evaluate the reconstruction quality and completeness of 3DGS map online to guide the robot. Then we design a sampling-based active reconstruction strategy to explore the unobserved areas and improve the reconstruction geometric and textural quality. To establish a complete robot active reconstruction system, we choose quadrotor as the robotic platform for its high agility. Then we devise a safety constraint with 3DGS to generate executable trajectories for quadrotor navigation in the 3DGS map. To validate the effectiveness of our method, we conduct extensive experiments and ablation studies in highly realistic simulation scenes.
主动重建技术使机器人能够自主收集场景数据以实现全面覆盖,从而摆脱用户从繁琐且耗时的数据捕捉过程。然而,由于不合适的场景表示,现有的方法表现出不现实的重构结果或在线质量评估无能为力。由于最近在显式辐射场技术方面的进步,在线主动高保真度重构已经成为可能。在本文中,我们提出了GS-Planner,一种使用3D高斯平铺进行主动高保真度重构的规划框架。通过提高3DGS以识别未观察到的区域,我们在线评估了3DGS地图的重建质量和完整性,以指导机器人的行动。然后我们设计了一种基于采样的主动重构策略,以探索未观察到的区域并提高重建几何和纹理质量。为了建立完整的机器人主动重建系统,我们选择了四旋翼作为机器人平台,因为它具有高度的敏捷性。然后我们通过3DGS生成机器人导航在3DGS地图上的可执行轨迹的安全约束。为了验证我们方法的有效性,我们在具有高度逼真的仿真场景中进行了广泛的实验和消融研究。
https://arxiv.org/abs/2405.10142
In autonomous driving, accurately interpreting the movements of other road users and leveraging this knowledge to forecast future trajectories is crucial. This is typically achieved through the integration of map data and tracked trajectories of various agents. Numerous methodologies combine this information into a singular embedding for each agent, which is then utilized to predict future behavior. However, these approaches have a notable drawback in that they may lose exact location information during the encoding process. The encoding still includes general map information. However, the generation of valid and consistent trajectories is not guaranteed. This can cause the predicted trajectories to stray from the actual lanes. This paper introduces a new refinement module designed to project the predicted trajectories back onto the actual map, rectifying these discrepancies and leading towards more consistent predictions. This versatile module can be readily incorporated into a wide range of architectures. Additionally, we propose a novel scene encoder that handles all relations between agents and their environment in a single unified heterogeneous graph attention network. By analyzing the attention values on the different edges in this graph, we can gain unique insights into the neural network's inner workings leading towards a more explainable prediction.
在自动驾驶中,准确地解释其他道路用户的行为并利用这些知识预测未来轨迹至关重要。通常,这是通过将地图数据和不同代理的跟踪轨迹整合来实现这一目标的。许多方法将这一信息整合为每个代理的单一嵌入,然后用于预测未来的行为。然而,这些方法的一个显著缺点是在编码过程中可能会丢失精确的地理位置信息。编码过程仍然包括一般地图信息。然而,生成有效的和一致的轨迹并不是绝对的保证。这可能导致预测的轨迹与实际车道脱离。本文介绍了一种新的优化模块,旨在将预测的轨迹投射回实际地图,纠正这些差异并实现更一致的预测。这个多功能模块可以轻松地集成到各种架构中。此外,我们提出了一个全新的场景编码器,用于处理所有代理和它们环境之间的关系,在单个统一的异质图注意力网络中。通过分析该图不同边缘的注意力值,我们可以深入了解神经网络的工作原理,从而实现更有解释性的预测。
https://arxiv.org/abs/2405.10134
In the typical urban intersection scenario, both vehicles and infrastructures are equipped with visual and LiDAR sensors. By successfully integrating the data from vehicle-side and road monitoring devices, a more comprehensive and accurate environmental perception and information acquisition can be achieved. The Calibration of sensors, as an essential component of autonomous driving technology, has consistently drawn significant attention. Particularly in scenarios involving multiple sensors collaboratively perceiving and addressing localization challenges, the requirement for inter-sensor calibration becomes crucial. Recent years have witnessed the emergence of the concept of multi-end cooperation, where infrastructure captures and transmits surrounding environment information to vehicles, bolstering their perception capabilities while mitigating costs. However, this also poses technical complexities, underscoring the pressing need for diverse end calibration. Camera and LiDAR, the bedrock sensors in autonomous driving, exhibit expansive applicability. This paper comprehensively examines and analyzes the calibration of multi-end camera-LiDAR setups from vehicle, roadside, and vehicle-road cooperation perspectives, outlining their relevant applications and profound significance. Concluding with a summary, we present our future-oriented ideas and hypotheses.
在典型的城市交叉口场景中,车辆和基础设施都配备了视觉和激光雷达传感器。通过成功将来自车辆侧和道路监测设备的数据显示集成在一起,可以实现更全面、更准确的感知和信息获取。传感器校准作为自动驾驶技术的重要组成部分,一直引起了广泛关注。特别是在涉及多个传感器协同感知和解决定位挑战的场景中,对传感器之间的互校准需求变得至关重要。近年来,出现了多端合作的概念,基础设施捕获并传输周围环境信息给车辆,增强他们的感知能力,同时减轻成本负担。然而,这也带来了技术复杂性,凸显了多样端校准的迫切需求。相机和激光雷达,自动驾驶的基础传感器,具有广泛的适用性。本文从车辆、路侧和车辆-道路合作的角度,全面探讨和分析多端相机-激光雷达设置的校准问题,阐述它们的相关应用和深远意义。最后以总结作为结束,我们呈现了未来导向的想法和假设。
https://arxiv.org/abs/2405.10132
Infrared physical adversarial examples are of great significance for studying the security of infrared AI systems that are widely used in our lives such as autonomous driving. Previous infrared physical attacks mainly focused on 2D infrared pedestrian detection which may not fully manifest its destructiveness to AI systems. In this work, we propose a physical attack method against infrared detectors based on 3D modeling, which is applied to a real car. The goal is to design a set of infrared adversarial stickers to make cars invisible to infrared detectors at various viewing angles, distances, and scenes. We build a 3D infrared car model with real infrared characteristics and propose an infrared adversarial pattern generation method based on 3D mesh shadow. We propose a 3D control points-based mesh smoothing algorithm and use a set of smoothness loss functions to enhance the smoothness of adversarial meshes and facilitate the sticker implementation. Besides, We designed the aluminum stickers and conducted physical experiments on two real Mercedes-Benz A200L cars. Our adversarial stickers hid the cars from Faster RCNN, an object detector, at various viewing angles, distances, and scenes. The attack success rate (ASR) was 91.49% for real cars. In comparison, the ASRs of random stickers and no sticker were only 6.21% and 0.66%, respectively. In addition, the ASRs of the designed stickers against six unseen object detectors such as YOLOv3 and Deformable DETR were between 73.35%-95.80%, showing good transferability of the attack performance across detectors.
红外物理攻击范例对研究广泛应用于我们生活中的红外人工智能系统的安全性具有重大意义,如自动驾驶。以前的红外物理攻击主要集中在2D红外行人检测,这可能不足以完全表现其破坏性给人工智能系统。在这项工作中,我们提出了基于3D建模的对红外检测器的物理攻击方法,应用于实际汽车。目标是设计一组红外攻击贴纸,使汽车在不同的视角、距离和场景下对红外检测器不可见。我们基于真实红外特征构建了一个3D红外汽车模型,并提出了基于3D网格阴影的红外攻击图案生成方法。我们提出了基于3D控制点网格平滑算法,并使用一组平滑损失函数增强攻击mesh的平滑度,并促进贴纸的实现。此外,我们设计了一种铝制贴纸,并在两个实际梅赛德斯-奔驰A200L汽车上进行了物理实验。我们的攻击贴纸在不同的视角、距离和场景下成功隐藏了Faster RCNN(物体检测器)视野内的汽车。攻击成功率(ASR)为91.49%。与随机贴纸和无贴纸相比,ASR分别为6.21%和0.66%。此外,设计贴纸对六种未见过的物体检测器(如YOLOv3和Deformable DETR)的ASR在73.35%-95.80%之间,表明攻击性能的跨检测器效果很好。
https://arxiv.org/abs/2405.09924
Multi-line LiDAR is widely used in autonomous vehicles, so point cloud-based 3D detectors are essential for autonomous driving. Extracting rich multi-scale features is crucial for point cloud-based 3D detectors in autonomous driving due to significant differences in the size of different types of objects. However, due to the real-time requirements, large-size convolution kernels are rarely used to extract large-scale features in the backbone. Current 3D detectors commonly use feature pyramid networks to obtain large-scale features; however, some objects containing fewer point clouds are further lost during downsampling, resulting in degraded performance. Since pillar-based schemes require much less computation than voxel-based schemes, they are more suitable for constructing real-time 3D detectors. Hence, we propose PillarNeXt, a pillar-based scheme. We redesigned the feature encoding, the backbone, and the neck of the 3D detector. We propose Voxel2Pillar feature encoding, which uses a sparse convolution constructor to construct pillars with richer point cloud features, especially height features. Moreover, additional learnable parameters are added, which enables the initial pillar to achieve higher performance capabilities. We extract multi-scale and large-scale features in the proposed fully sparse backbone, which does not utilize large-size convolutional kernels; the backbone consists of the proposed multi-scale feature extraction module. The neck consists of the proposed sparse ConvNeXt, whose simple structure significantly improves the performance. The effectiveness of the proposed PillarNeXt is validated on the Waymo Open Dataset, and object detection accuracy for vehicles, pedestrians, and cyclists is improved; we also verify the effectiveness of each proposed module in detail.
多线激光雷达在自动驾驶中得到了广泛应用,因此基于点云的3D检测器对自动驾驶至关重要。由于不同类型物体的大小差异很大,因此从点云中提取丰富的多尺度特征对于自动驾驶中的点云测距器至关重要。然而,由于实时要求,大型卷积核通常不会用于从骨干网络中提取大尺度特征。目前,大多数3D检测器使用特征金字塔网络获取大尺度特征;然而,在 downsampling 过程中,一些包含较少点云的对象进一步丢失,导致性能下降。由于基于柱面的方案需要比基于体素的方案更少的计算,因此它们更适合用于构建实时3D检测器。因此,我们提出了PillarNeXt,一种基于柱面的方案。我们重新设计了3D检测器的特征编码器、骨干网络和颈部。我们提出了Voxel2Pillar特征编码器,它使用稀疏卷积构建具有丰富点云特征的支柱,特别是高度特征。此外,还增加了可学习的参数,使得初始支柱能够实现更高的性能能力。我们在提出的完全稀疏骨干中提取多尺度和大尺度特征,这并没有使用大型卷积核;骨干由提出的多尺度特征提取模块组成。颈部分别由提出的稀疏ConvNeXt组成,其简单的结构显著提高了性能。提出的PillarNeXt的有效性在Waymo Open Dataset上得到了验证,并对车辆、行人、自行车等的对象检测精度进行了提高。我们还详细验证了每个提出的模块的有效性。
https://arxiv.org/abs/2405.09828
This paper addresses the problem of object-goal navigation in autonomous inspections in real-world environments. Object-goal navigation is crucial to enable effective inspections in various settings, often requiring the robot to identify the target object within a large search space. Current object inspection methods fall short of human efficiency because they typically cannot bootstrap prior and common sense knowledge as humans do. In this paper, we introduce a framework that enables robots to use semantic knowledge from prior spatial configurations of the environment and semantic common sense knowledge. We propose SEEK (Semantic Reasoning for Object Inspection Tasks) that combines semantic prior knowledge with the robot's observations to search for and navigate toward target objects more efficiently. SEEK maintains two representations: a Dynamic Scene Graph (DSG) and a Relational Semantic Network (RSN). The RSN is a compact and practical model that estimates the probability of finding the target object across spatial elements in the DSG. We propose a novel probabilistic planning framework to search for the object using relational semantic knowledge. Our simulation analyses demonstrate that SEEK outperforms the classical planning and Large Language Models (LLMs)-based methods that are examined in this study in terms of efficiency for object-goal inspection tasks. We validated our approach on a physical legged robot in urban environments, showcasing its practicality and effectiveness in real-world inspection scenarios.
本文针对现实环境中自动驾驶检查中的物体目标导航问题进行了研究。物体目标导航对于在各种环境中有效检查物体至关重要,通常需要机器人识别一个大型搜索空间中的目标物体。当前的物体检查方法之所以不能达到人类效率,是因为它们通常无法利用人类在先前的空间配置中具有的语义知识和共同感觉。在本文中,我们引入了一个框架,使机器人可以使用环境先验配置中的语义知识和共同感觉知识。我们提出了SEEK(语义推理物体检查任务)框架,将语义先验知识与机器人的观察相结合以更有效地搜索和导航目标物体。SEEK有两个表示:动态场景图(DSG)和关系语义网络(RSN)。RSN是一个紧凑且实用的模型,用于估计DSG中空间元素中找到目标物体的概率。我们提出了一种新颖的概率规划框架,使用关系语义知识搜索物体。我们的仿真分析表明,SEEK在物体目标检查任务上优于本研究中使用的经典规划和基于大型语言模型的方法。我们在城市环境中使用实物机器人进行了验证,展示了其在现实世界检查场景中的实际性和有效性。
https://arxiv.org/abs/2405.09822
Odometry is a crucial component for successfully implementing autonomous navigation, relying on sensors such as cameras, LiDARs and IMUs. However, these sensors may encounter challenges in extreme weather conditions, such as snowfall and fog. The emergence of FMCW radar technology offers the potential for robust perception in adverse conditions. As the latest generation of FWCW radars, the 4D mmWave radar provides point cloud with range, azimuth, elevation, and Doppler velocity information, despite inherent sparsity and noises in the point cloud. In this paper, we propose EFEAR-4D, an accurate, highly efficient, and learning-free method for large-scale 4D radar odometry estimation. EFEAR-4D exploits Doppler velocity information delicately for robust ego-velocity estimation, resulting in a highly accurate prior guess. EFEAR-4D maintains robustness against point-cloud sparsity and noises across diverse environments through dynamic object removal and effective region-wise feature extraction. Extensive experiments on two publicly available 4D radar datasets demonstrate state-of-the-art reliability and localization accuracy of EFEAR-4D under various conditions. Furthermore, we have collected a dataset following the same route but varying installation heights of the 4D radar, emphasizing the significant impact of radar height on point cloud quality - a crucial consideration for real-world deployments. Our algorithm and dataset will be available soon at this https URL.
惯性测量是成功实现自主导航的关键组成部分,依赖于摄像头、激光雷达和惯性测量单元等传感器。然而,这些传感器可能会在极端天气条件下遇到挑战,例如降雪和雾。FMCW雷达技术的出现为在恶劣条件下实现稳健感知提供了可能。作为最新一代的FCWW雷达,4D毫米波雷达尽管存在点云的固有稀疏性和噪声,但提供距离、方位、高度和多普勒速度信息,为点云提供准确的测量结果。在本文中,我们提出了EFEAR-4D,一种准确、高效且无需学习的大型4D雷达惯性测量估计方法。EFEAR-4D通过精心利用多普勒速度信息实现稳健的自拍速度估计,从而实现高度准确的先验猜测。EFEAR-4D通过动态对象删除和有效的区域特征提取来保持对点云稀疏性和噪声的鲁棒性。在两个公开可用的4D雷达数据集上进行的大量实验证明,EFEAR-4D在各种条件下具有最先进的可靠性和定位精度。此外,我们还收集了同一路线但随安装高度变化的4D雷达的数据集,强调了雷达高度对点云质量的重大影响 - 这在现实部署中至关重要。我们的算法和数据集将很快在https://这个网址上发布。
https://arxiv.org/abs/2405.09780