Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
受约束的理性代理人在通常基于先验知识的基础上,通过评估有限的选择来做出决策。这些选择通常来自称为`默认策略`的参考点。然而,静态默认策略的固有刚性在未知环境中对代理商运作时提出了重大挑战,这些挑战并不包括代理商的先验知识。在这项工作中,我们引入了一种上下文生成默认策略,它利用机器人观察到的区域来预测未观测到的环境部分,从而使机器人能够根据实际观测到的地图和想象的未观测地图自适应地调整其默认策略。此外,有限理性框架的适应性使得机器人能够通过选择附近默认策略的少数轨迹来管理不可靠或不正确的想象。我们的方法利用了地平线模型进行地图预测和基于B-spline轨迹优化进行抽样规划来生成默认策略。大量评估显示,上下文生成策略在识别和避免未见到的障碍方面优于基线方法。此外,使用Crazyflie无人机进行的真实世界实验也证明了我们在领域外环境中的方法具有可适应性。
https://arxiv.org/abs/2409.11604
An inherent fragility of quadrotor systems stems from model inaccuracies and external disturbances. These factors hinder performance and compromise the stability of the system, making precise control challenging. Existing model-based approaches either make deterministic assumptions, utilize Gaussian-based representations of uncertainty, or rely on nominal models, all of which often fall short in capturing the complex, multimodal nature of real-world dynamics. This work introduces DroneDiffusion, a novel framework that leverages conditional diffusion models to learn quadrotor dynamics, formulated as a sequence generation task. DroneDiffusion achieves superior generalization to unseen, complex scenarios by capturing the temporal nature of uncertainties and mitigating error propagation. We integrate the learned dynamics with an adaptive controller for trajectory tracking with stability guarantees. Extensive experiments in both simulation and real-world flights demonstrate the robustness of the framework across a range of scenarios, including unfamiliar flight paths and varying payloads, velocities, and wind disturbances.
quadrotor系统的固有脆弱性源于模型不准确性和外部干扰。这些因素阻碍了性能并破坏了系统的稳定性,使得精确控制变得具有挑战性。现有的基于模型的方法要么做出确定性的假设,要么利用基于高斯的不确定性表示,或者依赖于原型模型,而这些方法往往都难以捕捉到现实世界动态的复杂和多模态特性。本文引入了DroneDiffusion,一种利用条件扩散模型学习 quadrotor 动态的新框架,将其表示为序列生成任务。DroneDiffusion 通过捕捉不确定性的时间特性并减轻错误传播,实现了对未知场景的优越泛化。我们将学习到的动态与自适应控制器相结合,为轨迹跟踪提供稳定性保证。在模拟和现实世界的飞行中,我们对该框架进行了广泛的实验,包括不熟悉的飞行路径和不同的负载、速度和风干扰等场景。实验结果表明,该框架在各种场景中具有鲁棒性,包括不熟悉的飞行路径和不同的负载、速度和风干扰等场景。
https://arxiv.org/abs/2409.11292
Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at this https URL.
无人机视频中的多个目标跟踪(MOT)对于计算机视觉的各种应用非常重要。当前的MOT跟踪器依赖于准确的物体检测结果和精确的目标识别(ReID)匹配。这些方法专注于优化目标的空间属性,而忽略了建模物体关系的时间线索,尤其是在具有挑战性的跟踪条件下,如物体变形和模糊等。为解决上述问题,我们提出了一个新颖的时空凝聚多目标跟踪框架(STCMOT),它利用历史嵌入特征来建模ReID和检测特征的序列顺序。具体来说,我们引入了一个时间嵌入增强模块,以增强基于相邻帧合作的个体嵌入的区分度。然后,通过一个时间检测平滑模块将轨迹嵌入传播,以挖掘时间域中的显著目标位置。在VisDrone2019和UAVDT数据集上进行的大量实验证明,我们的STCMOT在MOTA和IDF1指标上达到了最先进的水平。源代码已发布在https://这个链接上。
https://arxiv.org/abs/2409.11234
Efficient exploration of large-scale environments remains a critical challenge in robotics, with applications ranging from environmental monitoring to search and rescue operations. This article proposes a bio-mimetic multi-robot framework, \textit{Frontier Shepherding (FroShe)}, for large-scale exploration. The presented bio-inspired framework heuristically models frontier exploration similar to the shepherding behavior of herding dogs. This is achieved by modeling frontiers as a sheep swarm reacting to robots modeled as shepherding dogs. The framework is robust across varying environment sizes and obstacle densities and can be easily deployed across multiple agents. Simulation results showcase that the proposed method consistently performed irrespective of the simulated environment's varying sizes and obstacle densities. With the increase in the number of agents, the proposed method outperforms other state-of-the-art exploration methods, with an average improvement of $20\%$ with the next-best approach(for $3$ UAVs). The proposed technique was implemented and tested in a single and dual drone scenario in a real-world forest-like environment.
高效探索大型环境仍然是机器人领域的一个关键挑战,从环境监测到搜救行动都有应用。本文提出了一种生物仿真的多机器人框架,\textit{Frontier Shepherding (FroShe) },用于大型探索。所提出的生物仿真的框架通过将前沿探索建模为类似于放牧犬的羊群行为来解决前沿探索问题。通过将边界建模为仿真的放牧犬,该框架在各种环境大小和障碍密度下都具有鲁棒性,并且可以轻松地部署到多个代理。仿真结果表明,与模拟环境的大小和障碍密度无关,所提出的方法始终表现出优异的性能。随着代理数量的增加,所提出的方法超越了其他最先进的探索方法,其平均性能改进率为20%(对于3个UAV)。在现实世界类似于森林的环境中,该技术在一个和双无人机场景中得到了实现和测试。
https://arxiv.org/abs/2409.10931
In the rapidly evolving field of vision-language navigation (VLN), ensuring robust safety mechanisms remains an open challenge. Control barrier functions (CBFs) are efficient tools which guarantee safety by solving an optimal control problem. In this work, we consider the case of a teleoperated drone in a VLN setting, and add safety features by formulating a novel scene-aware CBF using ego-centric observations obtained through an RGB-D sensor. As a baseline, we implement a vision-language understanding module which uses the contrastive language image pretraining (CLIP) model to query about a user-specified (in natural language) landmark. Using the YOLO (You Only Look Once) object detector, the CLIP model is queried for verifying the cropped landmark, triggering downstream navigation. To improve navigation safety of the baseline, we propose ASMA -- an Adaptive Safety Margin Algorithm -- that crops the drone's depth map for tracking moving object(s) to perform scene-aware CBF evaluation on-the-fly. By identifying potential risky observations from the scene, ASMA enables real-time adaptation to unpredictable environmental conditions, ensuring optimal safety bounds on a VLN-powered drone actions. Using the robot operating system (ROS) middleware on a parrot bebop2 quadrotor in the gazebo environment, ASMA offers 59.4% - 61.8% increase in success rates with insignificant 5.4% - 8.2% increases in trajectory lengths compared to the baseline CBF-less VLN while recovering from unsafe situations.
在快速发展的视觉语言导航(VLN)领域,确保稳健的安全机制仍然是一个开放性的挑战。控制障碍功能(CBFs)是一种有效的工具,通过解决最优控制问题来保证安全。在这项工作中,我们考虑了一个遥控无人机在VLN环境中的情况,并通过通过RGB-D传感器获得的自适应场景观察结果,形式化了一种新颖的场景感知CBF。作为基线,我们实现了一个视觉语言理解模块,该模块使用预训练的对比语言图像(CLIP)模型来查询用户指定(自然语言)目标点。使用You Only Look Once(YOLO)物体检测器,CBF模型被查询以验证裁剪的目标点,从而触发下游导航。为了提高基线的导航安全性,我们提出了ASMA--自适应安全边距算法--,该算法对无人机的深度图进行裁剪,以在飞行中进行场景感知CBF评估。通过从场景中识别出潜在的风险观察,ASMA能够实现对不可预测的环境条件的实时适应,从而在VLN驱动的无人机操作中确保最优的安全边界。在gazebo环境中使用机器人操作系统(ROS)中间件,ASMA相对于基线CBF-less VLN,成功率提高了59.4% - 61.8%,而轨迹长度无意义的增加了5.4% - 8.2%。
https://arxiv.org/abs/2409.10283
One of the most useful applications of intelligent aerial robots sometimes called Unmanned Aerial Vehicles (UAV) in Australia is known to be in bushfire monitoring and prediction operations. A swarm of autonomous drones/UAVs programmed to work in real-time observing the fire parameters using their onboard sensors would be valuable in reducing the life-threatening impact of that fire. However autonomous UAVs face serious challenges in their positioning and navigation in critical bushfire conditions such as remoteness and severe weather conditions where GPS signals could also be unreliable. This paper tackles one of the most important factors in autonomous UAV navigation, namely Initial Positioning sometimes called Localisation. The solution provided by this paper will enable a team of autonomous UAVs to establish a relative position to their base of operation to be able to commence a team search and reconnaissance in a bushfire-affected area and find their way back to their base without the help of GPS signals.
澳大利亚最智能的无人机(UAV)之一,有时被称为无人驾驶飞机(UAV)的应用之一是监测和预测野火。一群编队无人机/UAV使用机载传感器实时观察火参数,在减少这种火对生命构成严重影响的方面将非常有益。然而,在关键的野火条件下,例如偏远和恶劣天气条件下,自主UAV的定位和导航会面临严重挑战。本文处理自主UAV导航中一个最重要的因素——初始定位,有时也称为本地化。本文提供的解决方案将使一组自主UAV能够确定它们在操作基地的位置,以便开始在受野火影响的区域进行团队搜索和侦察,并找到回家的路,而无需依赖GPS信号。
https://arxiv.org/abs/2409.10193
This paper presents the concept of Industry 6.0, introducing the world's first fully automated production system that autonomously handles the entire product design and manufacturing process based on user-provided natural language descriptions. By leveraging generative AI, the system automates critical aspects of production, including product blueprint design, component manufacturing, logistics, and assembly. A heterogeneous swarm of robots, each equipped with individual AI through integration with Large Language Models (LLMs), orchestrates the production process. The robotic system includes manipulator arms, delivery drones, and 3D printers capable of generating assembly blueprints. The system was evaluated using commercial and open-source LLMs, functioning through APIs and local deployment. A user study demonstrated that the system reduces the average production time to 119.10 minutes, significantly outperforming a team of expert human developers, who averaged 528.64 minutes (an improvement factor of 4.4). Furthermore, in the product blueprinting stage, the system surpassed human CAD operators by an unprecedented factor of 47, completing the task in 0.5 minutes compared to 23.5 minutes. This breakthrough represents a major leap towards fully autonomous manufacturing.
本文介绍了 Industry 6.0 的概念,介绍了全球第一个完全自动化的生产系统,该系统可以根据用户提供的自然语言描述自治地处理整个产品设计和制造过程。通过利用生成式 AI,系统自动化了生产过程中的关键方面,包括产品蓝图设计、零部件制造、物流和组装。一个异构的 swarm of 机器人,每个机器人通过集成大型语言模型 (LLMs) 配备了个体 AI,指挥生产过程。 系统包括操作器臂、配送无人机和能够生成装配蓝图的 3D 打印机。系统通过商业和开源 LLMs 进行评估,并通过 API 和本地部署来运行。用户研究证明,该系统将平均生产时间缩短至 119.10 分钟,显著优于由专家人类开发者组成的团队,该团队平均耗时 528.64 分钟(改进系数为 4.4 倍)。 此外,在产品蓝图设计阶段,系统比人类 CAD 操作员超出预期地提高了 47 个百分点,完成任务的时间从 23.5 分钟缩短至 0.5 分钟。这一突破代表了一个重大的飞跃,迈向完全自主制造。
https://arxiv.org/abs/2409.10106
The DARPA Subterranean Challenge is leading the development of robots capable of mapping underground mines and tunnels up to 8km in length and identify objects and people. Developing these autonomous abilities paves the way for future planetary cave and surface exploration missions. The Co-STAR team, competing in this challenge, is developing a hybrid aerial-ground vehicle, known as the Rollocopter. The current design of this vehicle is a drone with wheels attached. This allows for the vehicle to roll, actuated by the propellers, and fly only when necessary, hence benefiting from the reduced power consumption of the ground mode and the enhanced mobility of the aerial mode. This thesis focuses on the development and increased robustness of the local planning architecture for the Rollocopter. The first development of thesis is a local planner capable of collision avoidance. The local planning node provides the basic functionality required for the vehicle to navigate autonomously. The next stage was augmenting this with the ability to plan more reliably without localisation. This was then integrated with a hybrid mobility mode capable of rolling and flying to exploit power and mobility benefits of the respective configurations. A traversability analysis algorithm as well as determining the terrain that the vehicle is able to traverse is in the late stages of development for informing the decisions of the hybrid planner. A simulator was developed to test the planning algorithms and improve the robustness of the vehicle to different environments. The results presented in this thesis are related to the mobility of the rollocopter and the range of environments that the vehicle is capable of traversing. Videos are included in which the vehicle successfully navigates through dust-ridden tunnels, horizontal mazes, and areas with rough terrain.
DARPA Subterranean Challenge 正在引领开发能够测绘长度达到8公里以下的地下矿山和隧道,并识别物体和人员的机器人。发展这些自主能力为未来的行星地下和地表探险任务铺平道路。在这个挑战中参赛的 Co-STAR 团队正在开发一种名为 Rollocopter 的混合式垂直起降车辆。这种车辆的设计是一个附有轮子的无人机。这使得车辆可以在轮子驱动下滚动,通过螺旋桨操纵,仅在需要时飞行,从而节省了地面模式下的能量消耗,提高了空中的灵活性。本文重点关注 Rollocopter 当地规划架构的開發和增強。第一篇论文是具备碰撞避免能力的局部规划器。局部规划节点提供了车辆自主导航所需的基本功能。接下来的目标是通过无定位能力规划更可靠的路径。然后将这种能力与具有滚动和飞行能力的混合式机动模式相结合,以利用各自的优缺点。地形可穿越性分析和确定车辆能够穿越的地形是本论文开发的晚期阶段。用仿真器测试规划算法以提高车辆的弹性和适应性在不同环境中。本论文呈现的结果与 Rollocopter 的 mobility 和所能穿越的环境范围有关。包括车辆在灰尘隧道、水平迷宫和起伏不平的地形中成功穿过的视频。
https://arxiv.org/abs/2409.09967
SAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at this https URL.
SAFER-Splat(同时动作滤波器和环境重构)是一种基于控制屏障函数的实时可扩展且最小侵入性的动作滤波器,用于在运行时使用Gaussian Splatting(GSplat)详细构建的地图安全地控制机器人导航。我们提出了一个新颖的控制屏障函数(CBF),不仅与场景中的所有高斯基本粒子有关,而且当合成到控制器时,能够处理成千上万个Gaussians,同时保持最小的内存足迹,并且在在线Splat训练过程中以15 Hz的频率操作。训练总时间的大部分时间,只有很小的部分消耗GPU资源,使训练过程不受干扰。安全层是对机器人不安全行为的最小侵入性修正。为了展示安全滤波器,我们还引入了SplatBridge,一个用ROS构建的实时GSplat映射的开放源代码软件包。我们在仿真中首先证明了我们的管道的安全性和稳健性,其中我们的方法比基于神经辐射场竞争方法快20-50倍,更安全,更保守。此外,我们还在一个无人机硬件平台上展示了同时进行GSplat映射和安全滤波。我们证实,在遥控下,人类飞行员无法引发碰撞。我们的视频和代码库可以在这个链接找到:https://www.rp-robotics.com/。
https://arxiv.org/abs/2409.09868
It's possible to distribute the Internet to users via drones. However it is then necessary to place the drones according to the positions of the users. Moreover, the 5th Generation (5G) New Radio (NR) technology is designed to accommodate a wide range of applications and industries. The NGNM 5G White Paper \cite{5gwhitepaper} groups these vertical use cases into three categories: - enhanced Mobile Broadband (eMBB) - massive Machine Type Communication (mMTC) - Ultra-Reliable Low-latency Communication (URLLC). Partitioning the physical network into multiple virtual networks appears to be the best way to provide a customised service for each application and limit operational costs. This design is well known as \textit{network slicing}. Each drone must thus slice its bandwidth between each of the 3 user classes. This whole problem (placement + bandwidth) can be defined as an optimization problem, but since it is very hard to solve efficiently, it is almost always addressed by AI in the litterature. In my internship, I wanted to prove that viewing the problem as an optimization problem can still be useful, by building an hybrid solution involving on one hand AI and on the other optimization. I use it to achieve better results than approaches that use only AI, although at the cost of slightly larger (but still reasonable) computation times.
通过无人机将互联网分配给用户是可能的。然而,之后需要将无人机根据用户的位置进行定位。此外,第五代(5G)新无线通信(NR)技术旨在适应各种应用和产业。NGNM 5G白皮书 \cite{5gwhitepaper} 将这些垂直用例分为三个类别:增强移动宽带(eMBB)、大规模机器类型通信(mMTC)和超可靠低延迟通信(URLLC)。将物理网络分割成多个虚拟网络似乎是提供每个应用定制服务并限制运营成本的最佳方法。这个设计被称为\textit{网络切片》。每个无人机都必须在每种用户类之间切分其带宽。因此,整个问题(部署+带宽)可以定义为优化问题,但由于很难高效地解决,因此在文献中几乎总是由AI来解决。 在实习期间,我想证明将问题视为优化问题仍然是有用的,通过构建涉及AI和优化的混合解决方案来证明。我使用它来获得比仅使用AI的解决方案更好的结果,尽管代价是略微增加的(但仍然合理)计算时间。
https://arxiv.org/abs/2409.11432
Zero-shot coordination (ZSC) is a significant challenge in multi-agent collaboration, aiming to develop agents that can coordinate with unseen partners they have not encountered before. Recent cutting-edge ZSC methods have primarily focused on two-player video games such as OverCooked!2 and Hanabi. In this paper, we extend the scope of ZSC research to the multi-drone cooperative pursuit scenario, exploring how to construct a drone agent capable of coordinating with multiple unseen partners to capture multiple evaders. We propose a novel Hypergraphic Open-ended Learning Algorithm (HOLA-Drone) that continuously adapts the learning objective based on our hypergraphic-form game modeling, aiming to improve cooperative abilities with multiple unknown drone teammates. To empirically verify the effectiveness of HOLA-Drone, we build two different unseen drone teammate pools to evaluate their performance in coordination with various unseen partners. The experimental results demonstrate that HOLA-Drone outperforms the baseline methods in coordination with unseen drone teammates. Furthermore, real-world experiments validate the feasibility of HOLA-Drone in physical systems. Videos can be found on the project homepage~\url{this https URL}.
零击协调(ZSC)是多智能体协作中的一个重要挑战,旨在开发可以与以前未见过的合作伙伴协调的智能体。最近,ZSC 方法主要集中在像 OverCooked!2 和 Hanabi 这样的双人视频游戏中。在本文中,我们扩展了 ZSC 研究的范围,探讨了如何构建一个无人机智能体,能够与多个未见过的合作伙伴协同行动,捕捉多个目标。我们提出了一种新颖的 超图学习算法(HOLA-Drone),该算法根据我们的超图游戏建模动态调整学习目标,旨在提高与多个未知无人机队友的协作能力。为了实证验证 HOLA-Drone 的有效性,我们构建了两个不同的未见过的无人机队友池,以评估它们在不同未见合作伙伴下的表现。实验结果表明,HOLA-Drone 在与未见合作伙伴的协同行动中优于基线方法。此外,实际应用实验证实了 HOLA-Drone 在物理系统中的可行性。视频可以在项目主页上查看~\url{这个链接}。
https://arxiv.org/abs/2409.08767
The question of how cyber-physical systems should interact with human partners that can take over control or exert oversight is becoming more pressing, as these systems are deployed for an ever larger range of tasks. Drawing on the literatures on handing over control during semi-autonomous driving and human-robot interaction, we propose a design of a take-over request that combines an abstract pre-alert with an informative TOR: Relevant sensor information is highlighted on the controller's display, while a spoken message verbalizes the reason for the TOR. We conduct our study in the context of a semi-autonomous drone control scenario as our testbed. The goal of our online study is to assess in more detail what form a language-based TOR should take. Specifically, we compare a full sentence condition to shorter fragments, and test whether the visual highlighting should be done synchronously or asynchronously with the speech. Participants showed a higher accuracy in choosing the correct solution with our bi-modal TOR and felt that they were better able to recognize the critical situation. Using only fragments in the spoken message rather than full sentences did not lead to improved accuracy or faster reactions. Also, synchronizing the visual highlighting with the spoken message did not result in better accuracy and response times were even increased in this condition.
如何在自主驾驶中如何与人类伙伴交互以及如何进行监督,以转让控制权,这是一个越来越紧迫的问题,因为这些系统被用于越来越广泛的任务。在半自主驾驶和人类机器人交互的文献中,我们提出了一个交接请求的设计,该设计结合了抽象预警和有用的TOR:相关的传感器信息在控制器界面上突出显示,而口头信息则口头说明TOR的原因。我们在半自主驾驶无人机控制场景中进行研究,作为我们的测试平台。我们在线研究的目的是更深入地评估基于语言的TOR应该具有什么形式。具体来说,我们将完整的句子与较短的片段进行比较,并测试视觉突出是否应该与语音同步或异步进行。参与者使用我们的双模TOR选择正确解决方案时表现出了更高的准确度,并且他们觉得自己更好地能够识别关键情况。仅使用口头信息中的片段而不是完整的句子,并没有提高准确度或加快反应速度。此外,将视觉突出与口头信息同步,也没有提高准确度和反应时间。
https://arxiv.org/abs/2409.08253
The global increase in observed forest dieback, characterised by the death of tree foliage, heralds widespread decline in forest ecosystems. This degradation causes significant changes to ecosystem services and functions, including habitat provision and carbon sequestration, which can be difficult to detect using traditional monitoring techniques, highlighting the need for large-scale and high-frequency monitoring. Contemporary developments in the instruments and methods to gather and process data at large-scales mean this monitoring is now possible. In particular, the advancement of low-cost drone technology and deep learning on consumer-level hardware provide new opportunities. Here, we use an approach based on deep learning and vegetation indices to assess crown dieback from RGB aerial data without the need for expensive instrumentation such as LiDAR. We use an iterative approach to match crown footprints predicted by deep learning with field-based inventory data from a Mediterranean ecosystem exhibiting drought-induced dieback, and compare expert field-based crown dieback estimation with vegetation index-based estimates. We obtain high overall segmentation accuracy (mAP: 0.519) without the need for additional technical development of the underlying Mask R-CNN model, underscoring the potential of these approaches for non-expert use and proving their applicability to real-world conservation. We also find colour-coordinate based estimates of dieback correlate well with expert field-based estimation. Substituting ground truth for Mask R-CNN model predictions showed negligible impact on dieback estimates, indicating robustness. Our findings demonstrate the potential of automated data collection and processing, including the application of deep learning, to improve the coverage, speed and cost of forest dieback monitoring.
全球森林退化的增加,以树叶死亡为特征,预示着广泛的森林生态系统衰败。这种退化导致生态系统服务和工作功能的重大变化,包括栖息地提供和碳储存,这些变化很难通过传统监测技术检测到,凸显了需要大范围和高频度的监测。当前在大规模数据收集和处理工具和方法的发展使这种监测成为可能。特别是,低成本无人机技术和深度学习在消费者级硬件上的进步为监测提供了新的机会。在这里,我们使用基于深度学习和植被指数的方法评估顶端死亡从彩色高空数据,无需昂贵的仪器设备(如激光雷达)。我们使用迭代方法将预测的顶端足迹与展性诱导退化的现场基线数据中的场基数据匹配,并将专家场基顶端死亡估计与植被指数基线估计进行比较。我们获得了高整体分割精度(mAP: 0.519),无需对底层Mask R-CNN模型的额外技术开发,强调了这些方法的非专家可用性和其在现实世界 conservation 中的应用前景。我们还发现,基于颜色的退化关联估计与专家场基估计非常接近。用Mask R-CNN模型预测顶端死亡替代真实世界结果对死亡估计的影响非常小,表明了鲁棒性。我们的研究结果表明,自动数据收集和处理,包括应用深度学习,可以改善森林退监测的覆盖、速度和成本。
https://arxiv.org/abs/2409.08171
Considering the growing prominence of production-level AI and the threat of adversarial attacks that can evade a model at run-time, evaluating the robustness of models to these evasion attacks is of critical importance. Additionally, testing model changes likely means deploying the models to (e.g. a car or a medical imaging device), or a drone to see how it affects performance, making un-tested changes a public problem that reduces development speed, increases cost of development, and makes it difficult (if not impossible) to parse cause from effect. In this work, we used survival analysis as a cloud-native, time-efficient and precise method for predicting model performance in the presence of adversarial noise. For neural networks in particular, the relationships between the learning rate, batch size, training time, convergence time, and deployment cost are highly complex, so researchers generally rely on benchmark datasets to assess the ability of a model to generalize beyond the training data. To address this, we propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy by using adversarial attacks to induce failures on a reference model architecture before deploying the model to the real world. We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously. This provides a way to evaluate the model and optimise it in a single step, while simultaneously allowing us to model the effect of model parameters on training time, prediction time, and accuracy. Using this technique, we demonstrate that newer, more-powerful hardware does decrease the training time, but with a monetary and power cost that far outpaces the marginal gains in accuracy.
考虑到生产级别AI日益凸显的重要性以及能够绕过在运行时模型的新型攻击威胁,评估模型的鲁棒性对于应对这些攻击至关重要。此外,测试模型变化可能意味着将模型部署到(例如汽车或医疗成像设备)或无人机上,观察其对性能的影响,使未经过测试的修改成为公共问题,降低了开发速度,增加了开发成本,使得解析因果关系变得困难(甚至不可能)。在这项工作中,我们使用生存分析作为一种云计算、高效且精确的方法,预测在对抗噪声下模型的性能。对于神经网络来说,学习率、批量大小、训练时间、收敛时间与部署成本之间的关系非常复杂,因此研究人员通常依赖于基准数据集来评估模型在训练数据之外的一般化能力。为解决这个问题,我们提出了使用加速失败时间模型来衡量硬件选择、批量大小、训练轮数和测试集准确性的方法,通过在参考模型架构上使用对抗攻击来诱导模型失败,在部署模型到现实世界之前。我们评估了几种GPU类型,并使用树帕辛估计器来同时最大化模型鲁棒性和最小化模型运行时间。这为我们在一步之内评估模型并提供优化提供了途径,同时允许我们同时模型参数对训练时间、预测时间和准确度的影响。利用这种技术,我们证明了更先进、更强大的硬件确实降低了训练时间,但代价是远远超过精度微小增长值的货币和能源成本。
https://arxiv.org/abs/2409.07609
Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific performance metrics, such as sensor data rates, network bandwidth, and machine learning model latency. While these metrics can be modeled during system design, uncertainties in connection quality, server load, and hardware conditions introduce real-time performance variations, hindering overall performance. We introduce PEERNet, an end-to-end and real-time profiling tool for cloud robotics. PEERNet enables performance monitoring on heterogeneous hardware through targeted yet adaptive profiling of system components such as sensors, networks, deep-learning pipelines, and devices. We showcase PEERNet's capabilities through networked robotics tasks, such as image-based teleoperation of a Franka Emika Panda arm and querying vision language models using an Nvidia Jetson Orin. PEERNet reveals non-intuitive behavior in robotic systems, such as asymmetric network transmission and bimodal language model output. Our evaluation underscores the effectiveness and importance of benchmarking in networked robotics, demonstrating PEERNet's adaptability. Our code is open-source and available at this http URL.
网络机器人系统在诸如自动驾驶汽车、无人机群和远程手术等应用中平衡计算、功耗和延迟约束。该领域的一个核心问题是在计算密集型任务下决定何时将计算负担转移到云、远程服务器,以换取通信延迟。任务卸载算法通常依赖于对系统特定性能指标的精确了解,例如传感器数据速率、网络带宽和机器学习模型延迟。虽然这些指标在系统设计过程中可以建模,但连接质量、服务器负载和硬件条件的不可预测性引入了实时性能变化,阻碍了整体性能。我们介绍了PEERNet,一个端到端的云机器人实时 profiling 工具。PEERNet 通过针对系统组件(如传感器、网络、深度学习管道和设备)的定向且适应性 profiling,实现对异构硬件的性能监控。我们通过网络机器人任务展示了PEERNet的功能,例如基于图像的远程操作弗兰克aEmika熊猫手臂和使用Nvidia Jetson Orin查询视觉语言模型。PEERNet揭示了机器人在非直觉行为,例如非对称网络传输和双模态语言模型输出。我们的评估强调了在网络机器人领域进行基准测试的有效性和重要性,证明了PEERNet的适应性。我们的代码是开源的,可以从该链接下载。
https://arxiv.org/abs/2409.06078
With the rapid development of drone technology, accurate detection of Unmanned Aerial Vehicles (UAVs) has become essential for applications such as surveillance, security, and airspace management. In this paper, we propose a novel trajectory-guided method, the Patch Intensity Convergence (PIC) technique, which generates high-fidelity bounding boxes for UAV detection tasks and no need for the effort required for labeling. The PIC technique forms the foundation for developing UAVDB, a database explicitly created for UAV detection. Unlike existing datasets, which often use low-resolution footage or focus on UAVs in simple backgrounds, UAVDB employs high-resolution video to capture UAVs at various scales, ranging from hundreds of pixels to nearly single-digit sizes. This broad-scale variation enables comprehensive evaluation of detection algorithms across different UAV sizes and distances. Applying the PIC technique, we can also efficiently generate detection datasets from trajectory or positional data, even without size information. We extensively benchmark UAVDB using YOLOv8 series detectors, offering a detailed performance analysis. Our findings highlight UAVDB's potential as a vital database for advancing UAV detection, particularly in high-resolution and long-distance tracking scenarios.
随着无人机技术的快速发展,准确检测无人机(UAVs)已成为诸如监视、安全和空域管理等应用中不可或缺的关键。在本文中,我们提出了一个新的轨迹引导方法,即Patch Intensity Convergence(PIC)技术,为无人机检测任务生成高保真度的边界框,无需进行标签。PIC技术为开发UAVDB奠定了基础,这是一种专门为无人机检测创建的数据库。与现有数据集不同,它们通常使用低分辨率视频或关注于简单的背景中的无人机。UAVDB采用高分辨率视频捕捉各种大小的无人机,从几百个像素到几乎单数字的大小。这种大范围的变化使得可以在不同UAV大小和距离上全面评估检测算法的性能。此外,我们还可以从轨迹或位置数据 efficiently生成检测数据集,即使没有尺寸信息。我们通过使用YOLOv8系列检测器对UAVDB进行了广泛基准测试,提供了详细的表现分析。我们的研究结果突出了UAVDB在促进UAV检测方面的潜力,特别是在高分辨率和远距离跟踪场景中。
https://arxiv.org/abs/2409.06490
Despite the growing impact of Unmanned Aerial Vehicles (UAVs) across various industries, most of current available solutions lack for a robust autonomous navigation system to deal with the appearance of obstacles safely. This work presents an approach to perform autonomous UAV planning and navigation in scenarios in which a safe and high maneuverability is required, due to the cluttered environment and the narrow rooms to move. The system combines an RRT* global planner with a newly proposed reactive planner, DWA-3D, which is the extension of the well known DWA method for 2D robots. We provide a theoretical-empirical method for adjusting the parameters of the objective function to optimize, easing the classical difficulty for tuning them. An onboard LiDAR provides a 3D point cloud, which is projected on an Octomap in which the planning and navigation decisions are made. There is not a prior map; the system builds and updates the map online, from the current and the past LiDAR information included in the Octomap. Extensive real-world experiments were conducted to validate the system and to obtain a fine tuning of the involved parameters. These experiments allowed us to provide a set of values that ensure safe operation across all the tested scenarios. Just by weighting two parameters, it is possible to prioritize either horizontal path alignment or vertical (height) tracking, resulting in enhancing vertical or lateral avoidance, respectively. Additionally, our DWA-3D proposal is able to navigate successfully even in absence of a global planner or with one that does not consider the drone's size. Finally, the conducted experiments show that computation time with the proposed parameters is not only bounded but also remains stable around 40 ms, regardless of the scenario complexity.
尽管在各个行业中无人机(UAVs)的广泛应用,但目前可用的解决方案大多缺乏安全可靠的自主导航系统来处理障碍物的出现,这使得在复杂环境中实现安全和高效的自主飞行变得具有挑战性。本文提出了一种在需要安全和高效自主导航的场景中执行自主UAV规划和导航的方法,该方法将RRT*全局规划器与新提出的反应规划器DWA-3D相结合,这是著名的DWA方法在2D机器人上的扩展。我们提供了一种调整目标函数参数以优化参数的方法,从而减轻了调整参数的经典困难。车载激光雷达提供了一个3D点云,该点云在Octomap上投影,其中规划和导航决策是在此地图上进行的。由于没有先前的地图,系统会根据Octomap中的当前和过去激光雷达信息在线构建和更新地图。 进行了大量现实世界的实验来验证该系统并获取涉及参数的微调。这些实验使我们能够提供一组值,确保在所有测试场景下实现安全运行。仅通过权衡两个参数,就可以实现将横向路径对齐或垂直(高度)跟踪,从而分别增强垂直或水平避障。此外,我们的DWA-3D提议能够在没有全局规划器或考虑无人机大小的规划器的情况下成功导航。最后,通过进行的实验,我们发现所提出的参数下的计算时间不仅是有限的,而且始终在40毫秒左右,而不管场景的复杂程度。
https://arxiv.org/abs/2409.05421
Multicopter drones are becoming a key platform in several application domains, enabling precise on-the-spot sensing and/or actuation. We focus on the case where the drone must process the sensor data in order to decide, depending on the outcome, whether it needs to perform some additional action, e.g., more accurate sensing or some form of actuation. On the one hand, waiting for the computation to complete may waste time, if it turns out that no further action is needed. On the other hand, if the drone starts moving toward the next point of interest before the computation ends, it may need to return back to the previous point, if some action needs to be taken. In this paper, we propose a learning approach that enables the drone to take informed decisions about whether to wait for the result of the computation (or not), based on past experience gathered from previous missions. Through an extensive evaluation, we show that the proposed approach, when properly configured, outperforms several static policies, up to 25.8%, over a wide variety of different scenarios where the probability of some action being required at a given point of interest remains stable as well as for scenarios where this probability varies in time.
多旋翼无人机在多个应用领域正成为关键平台,实现精准的现场感知和/或执行。我们关注无人机是否需要处理传感器数据来做出决策,根据结果,是否需要执行某些附加操作,例如更精确的感知或某种形式的执行操作。一方面,等待计算完成可能会浪费时间,如果结果不需要进一步操作。另一方面,如果无人机在计算结束前开始朝下一个感兴趣点移动,它可能需要返回先前的点,如果需要采取某些操作。在本文中,我们提出了一个学习方法,使无人机能够根据过去的任务经验做出有关是否等待计算结果的决策。通过广泛的评估,我们证明了当正确配置时,所提出的方法在各种不同的场景中优于几个静态策略,这些场景中在给定兴趣点的概率需要某种操作时,概率保持稳定,以及在不同时间中概率变化的场景。
https://arxiv.org/abs/2409.04764
Autonomous indoor navigation of UAVs presents numerous challenges, primarily due to the limited precision of GPS in enclosed environments. Additionally, UAVs' limited capacity to carry heavy or power-intensive sensors, such as overheight packages, exacerbates the difficulty of achieving autonomous navigation indoors. This paper introduces an advanced system in which a drone autonomously navigates indoor spaces to locate a specific target, such as an unknown Amazon package, using only a single camera. Employing a deep learning approach, a deep reinforcement adaptive learning algorithm is trained to develop a control strategy that emulates the decision-making process of an expert pilot. We demonstrate the efficacy of our system through real-time simulations conducted in various indoor settings. We apply multiple visualization techniques to gain deeper insights into our trained network. Furthermore, we extend our approach to include an adaptive control algorithm for coordinating multiple drones to lift an object in an indoor environment collaboratively. Integrating our DRAL algorithm enables multiple UAVs to learn optimal control strategies that adapt to dynamic conditions and uncertainties. This innovation enhances the robustness and flexibility of indoor navigation and opens new possibilities for complex multi-drone operations in confined spaces. The proposed framework highlights significant advancements in adaptive control and deep reinforcement learning, offering robust solutions for complex multi-agent systems in real-world applications.
无人机在室内导航是一个具有挑战性的任务,主要原因是封闭环境中GPS的精度有限。此外,无人机承载重型或功耗密集传感器的能力有限,如过高的包裹,加剧了实现室内自主导航的难度。本文介绍了一种先进的无人机系统,该无人机仅使用一个摄像头在室内空间中自主导航,以定位特定目标,如未知的亚马逊包裹。采用深度学习方法,训练了一个深度强化自适应学习算法,以模拟专家飞行员的决策过程。通过在各种室内环境中进行实时仿真,我们证明了我们的系统的有效性。我们采用多种可视化技术来深入研究我们训练的神经网络。此外,我们将我们的方法扩展到包括一个自适应控制算法,用于协调多个无人机在室内环境中共同抬起一个物体。整合我们的DRAL算法,使多个UAVs能够学习适应动态条件和不确定性的最优控制策略。这一创新提高了室内导航的稳健性和灵活性,为在受限空间中进行复杂多无人机操作提供了新的可能性。所提出的框架突出了自控和深度强化学习方面的显著进步,为现实应用中的复杂多代理系统提供了稳健的解决方案。
https://arxiv.org/abs/2409.03930
Unmanned Aerial Vehicles (UAVs), have greatly revolutionized the process of gathering and analyzing data in diverse research domains, providing unmatched adaptability and effectiveness. This paper presents a thorough examination of Unmanned Aerial Vehicle (UAV) datasets, emphasizing their wide range of applications and progress. UAV datasets consist of various types of data, such as satellite imagery, images captured by drones, and videos. These datasets can be categorized as either unimodal or multimodal, offering a wide range of detailed and comprehensive information. These datasets play a crucial role in disaster damage assessment, aerial surveillance, object recognition, and tracking. They facilitate the development of sophisticated models for tasks like semantic segmentation, pose estimation, vehicle re-identification, and gesture recognition. By leveraging UAV datasets, researchers can significantly enhance the capabilities of computer vision models, thereby advancing technology and improving our understanding of complex, dynamic environments from an aerial perspective. This review aims to encapsulate the multifaceted utility of UAV datasets, emphasizing their pivotal role in driving innovation and practical applications in multiple domains.
无人机(UAVs)已经在各种研究领域极大地推动了数据收集和分析的进程,提供了无与伦比的适应性和效果。本文对无人机数据集进行全面评估,强调它们的广泛应用和进展。无人机数据集包括各种类型的数据,如卫星影像、无人机捕获的图像和视频。这些数据集可以分为单模态或多模态,提供详尽而全面的信息。这些数据集在灾害损失评估、无人机监视、目标识别和跟踪中发挥着关键作用。它们为诸如语义分割、姿态估计、车辆识别和手势识别等任务开发复杂的模型提供了便利。通过利用无人机数据集,研究人员可以显著增强计算机视觉模型的能力,从而推动技术的发展和提高我们对复杂、动态环境的从空中的认识。本综述旨在概括无人机数据集的多重用途,强调其在多个领域推动创新和实际应用的关键作用。
https://arxiv.org/abs/2409.03245