Automatic monitoring of tree plantations plays a crucial role in agriculture. Flawless monitoring of tree health helps farmers make informed decisions regarding their management by taking appropriate action. Use of drone images for automatic plantation monitoring can enhance the accuracy of the monitoring process, while still being affordable to small farmers in developing countries such as India. Small, low cost drones equipped with an RGB camera can capture high-resolution images of agricultural fields, allowing for detailed analysis of the well-being of the plantations. Existing methods of automated plantation monitoring are mostly based on satellite images, which are difficult to get for the farmers. We propose an automated system for plantation health monitoring using drone images, which are becoming easier to get for the farmers. We propose a dataset of images of trees with three categories: ``Good health", ``Stunted", and ``Dead". We annotate the dataset using CVAT annotation tool, for use in research purposes. We experiment with different well-known CNN models to observe their performance on the proposed dataset. The initial low accuracy levels show the complexity of the proposed dataset. Further, our study revealed that, depth-wise convolution operation embedded in a deep CNN model, can enhance the performance of the model on drone dataset. Further, we apply state-of-the-art object detection models to identify individual trees to better monitor them automatically.
自动监测树苗种植在农业中扮演着至关重要的角色。对树木健康的精确监控帮助农民根据实际情况采取适当措施,从而做出明智的管理决策。利用无人机图像进行自动化的农田监测可以提高监测过程的准确性,并且对于印度等发展中国家的小农户来说仍具有经济性。小型低成本无人机配备RGB摄像头能够捕捉到农业用地的高分辨率影像,使得树苗种植地的整体状况得以详细分析。 现有的自动化种植监控方法大多基于卫星图像,这些图像难以获取。我们提出了一种使用无人机图像进行自动化的植树健康监测系统,该系统对于农民来说越来越容易获得。为此,我们创建了一个包含三类标签(“良好”、“发育不良”和“死亡”)的树苗图像数据集,并利用CVAT注释工具对其进行标注,以供研究用途。 为了观察这些模型在所提数据集上的表现,我们在几个知名的CNN模型上进行了实验。初步低准确率显示了该数据集的复杂性。此外,我们的研究表明,在深度卷积神经网络模型中嵌入深度卷积操作可以提高模型在无人机图像数据集上的性能。最后,我们应用最新的目标检测模型来识别单个树木,从而更好地实现自动监控。 总的来说,通过利用无人机技术和先进的机器学习技术,我们可以为农业中的植树健康监测提供一种有效的解决方案,并且这些方案对小规模农民来说也是可负担得起的。
https://arxiv.org/abs/2502.08233
Accurate task planning is critical for controlling autonomous systems, such as robots, drones, and self-driving vehicles. Behavior Trees (BTs) are considered one of the most prominent control-policy-defining frameworks in task planning, due to their modularity, flexibility, and reusability. Generating reliable and accurate BT-based control policies for robotic systems remains challenging and often requires domain expertise. In this paper, we present the LLM-GP-BT technique that leverages the Large Language Model (LLM) and Genetic Programming (GP) to automate the generation and configuration of BTs. The LLM-GP-BT technique processes robot task commands expressed in human natural language and converts them into accurate and reliable BT-based task plans in a computationally efficient and user-friendly manner. The proposed technique is systematically developed and validated through simulation experiments, demonstrating its potential to streamline task planning for autonomous systems.
准确的任务规划对于控制自主系统(如机器人、无人机和自动驾驶汽车)至关重要。行为树(BTs)因其模块化、灵活性和可重用性,被认为是任务规划中定义控制策略的最突出框架之一。为机器人系统生成可靠且精确的行为树基控制策略仍然具有挑战性,并通常需要领域专业知识。在本文中,我们介绍了LLM-GP-BT技术,该技术利用大型语言模型(LLM)和遗传编程(GP)来自动化行为树的生成与配置过程。 LLM-GP-BT技术处理用人类自然语言表达的机器人任务命令,并以计算高效且用户友好的方式将其转换为准确可靠的基于行为树的任务计划。通过模拟实验系统地开发并验证了该技术,表明其具有简化自主系统任务规划的潜力。
https://arxiv.org/abs/2502.07772
Autonomous drone navigation in dynamic environments remains a critical challenge, especially when dealing with unpredictable scenarios including fast-moving objects with rapidly changing goal positions. While traditional planners and classical optimisation methods have been extensively used to address this dynamic problem, they often face real-time, unpredictable changes that ultimately leads to sub-optimal performance in terms of adaptiveness and real-time decision making. In this work, we propose a novel motion planner, AgilePilot, based on Deep Reinforcement Learning (DRL) that is trained in dynamic conditions, coupled with real-time Computer Vision (CV) for object detections during flight. The training-to-deployment framework bridges the Sim2Real gap, leveraging sophisticated reward structures that promotes both safety and agility depending upon environment conditions. The system can rapidly adapt to changing environments, while achieving a maximum speed of 3.0 m/s in real-world scenarios. In comparison, our approach outperforms classical algorithms such as Artificial Potential Field (APF) based motion planner by 3 times, both in performance and tracking accuracy of dynamic targets by using velocity predictions while exhibiting 90% success rate in 75 conducted experiments. This work highlights the effectiveness of DRL in tackling real-time dynamic navigation challenges, offering intelligent safety and agility.
在动态环境中实现自主无人机导航依然是一个关键挑战,特别是在处理包括快速移动物体和目标位置迅速变化等不可预测场景时。尽管传统规划器和经典优化方法已被广泛用于解决这一动态问题,但它们往往难以应对实时的、不可预见的变化,这最终导致了适应性和实时决策方面的次优性能。在此研究中,我们提出了一种基于深度强化学习(DRL)的新颖运动规划器AgilePilot,并在动态条件下对其进行训练,同时结合飞行中的实时计算机视觉(CV)技术进行物体检测。该训练至部署框架通过利用复杂的奖励机制来弥合模拟与现实之间的差距,这些奖励机制根据环境条件促进安全性和灵活性。系统能够迅速适应不断变化的环境,在真实场景中实现最高3.0米/秒的速度。相比之下,我们的方法在性能和动态目标追踪精度方面超越了经典算法如人工势场(APF)运动规划器三倍,并且在75次实验中成功率为90%。本研究强调了DRL在解决实时动态导航挑战中的有效性,提供了智能的安全性和灵活性保障。
https://arxiv.org/abs/2502.06725
We present SIREN for registration of multi-robot Gaussian Splatting (GSplat) maps, with zero access to camera poses, images, and inter-map transforms for initialization or fusion of local submaps. To realize these capabilities, SIREN harnesses the versatility and robustness of semantics in three critical ways to derive a rigorous registration pipeline for multi-robot GSplat maps. First, SIREN utilizes semantics to identify feature-rich regions of the local maps where the registration problem is better posed, eliminating the need for any initialization which is generally required in prior work. Second, SIREN identifies candidate correspondences between Gaussians in the local maps using robust semantic features, constituting the foundation for robust geometric optimization, coarsely aligning 3D Gaussian primitives extracted from the local maps. Third, this key step enables subsequent photometric refinement of the transformation between the submaps, where SIREN leverages novel-view synthesis in GSplat maps along with a semantics-based image filter to compute a high-accuracy non-rigid transformation for the generation of a high-fidelity fused map. We demonstrate the superior performance of SIREN compared to competing baselines across a range of real-world datasets, and in particular, across the most widely-used robot hardware platforms, including a manipulator, drone, and quadruped. In our experiments, SIREN achieves about 90x smaller rotation errors, 300x smaller translation errors, and 44x smaller scale errors in the most challenging scenes, where competing methods struggle. We will release the code and provide a link to the project page after the review process.
我们介绍了SIREN,这是一种用于多机器人高斯点源(GSplat)地图注册的方法,在这种方法中,无需访问相机姿态、图像和子图之间的转换来初始化或融合局部子图。为了实现这些功能,SIREN 通过在三个方面利用语义的灵活性和鲁棒性来推导出一个多机器人 GSplat 地图的严格的配准流水线。首先,SIREN 使用语义识别局部地图中特征丰富的区域,在这些区域内配准问题有更好的构型,从而消除了以前工作中通常需要的初始化步骤。其次,SIREN 利用鲁棒的语义特征来确定局部地图间高斯点源之间的候选对应关系,这是进行稳健几何优化的基础,并粗略对齐从局部地图中提取出的 3D 高斯元。 第三步的关键在于使后续的子图之间变换的光度细化成为可能,在此过程中,SIREN 利用 GSplat 地图中的新视角合成以及基于语义的图像过滤器来计算生成高保真融合地图所需的高精度非刚性变换。我们在一系列真实世界数据集上展示了 SIREN 相比竞争基线方法的优越性能,特别是在最广泛使用的机器人硬件平台上(包括机械臂、无人机和四足机器人)也是如此。在我们的实验中,在最具挑战性的场景下,SIREN 达到了大约 90 倍更小的旋转误差、300 倍更小的平移误差以及 44 倍更小的比例误差,而这些场景是其他方法难以处理的。我们将在审稿流程结束后发布代码,并提供项目页面链接。
https://arxiv.org/abs/2502.06519
This paper investigates the application of Deep Reinforcement (DRL) Learning to address motion control challenges in drones for additive manufacturing (AM). Drone-based additive manufacturing promises flexible and autonomous material deposition in large-scale or hazardous environments. However, achieving robust real-time control of a multi-rotor aerial robot under varying payloads and potential disturbances remains challenging. Traditional controllers like PID often require frequent parameter re-tuning, limiting their applicability in dynamic scenarios. We propose a DRL framework that learns adaptable control policies for multi-rotor drones performing waypoint navigation in AM tasks. We compare Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) within a curriculum learning scheme designed to handle increasing complexity. Our experiments show TD3 consistently balances training stability, accuracy, and success, particularly when mass variability is introduced. These findings provide a scalable path toward robust, autonomous drone control in additive manufacturing.
本文研究了深度强化学习(DRL)在无人机增材制造(AM)中运动控制挑战的应用。基于无人机的增材制造允许多材料在大规模或危险环境中灵活且自主地沉积。然而,要在负载变化和潜在干扰的情况下实现多旋翼空中机器人的鲁棒实时控制仍然是一个难题。传统的PID控制器通常需要频繁调整参数,这限制了它们在动态场景中的应用范围。我们提出了一种DRL框架,该框架可以学习适用于执行AM任务中航路点导航的多旋翼无人机的适应性控制策略。我们在一种旨在处理不断增加复杂性的课程学习方案内比较了深度确定性策略梯度(DDPG)和孪生延迟深度确定性策略梯度(TD3)。实验表明,在引入质量变化时,TD3能够稳定地平衡训练的稳定性、准确性以及成功率。这些发现为增材制造中鲁棒且自主的无人机控制提供了一条可扩展的道路。
https://arxiv.org/abs/2502.05996
Vision-based object tracking is a critical component for achieving autonomous aerial navigation, particularly for obstacle avoidance. Neuromorphic Dynamic Vision Sensors (DVS) or event cameras, inspired by biological vision, offer a promising alternative to conventional frame-based cameras. These cameras can detect changes in intensity asynchronously, even in challenging lighting conditions, with a high dynamic range and resistance to motion blur. Spiking neural networks (SNNs) are increasingly used to process these event-based signals efficiently and asynchronously. Meanwhile, physics-based artificial intelligence (AI) provides a means to incorporate system-level knowledge into neural networks via physical modeling. This enhances robustness, energy efficiency, and provides symbolic explainability. In this work, we present a neuromorphic navigation framework for autonomous drone navigation. The focus is on detecting and navigating through moving gates while avoiding collisions. We use event cameras for detecting moving objects through a shallow SNN architecture in an unsupervised manner. This is combined with a lightweight energy-aware physics-guided neural network (PgNN) trained with depth inputs to predict optimal flight times, generating near-minimum energy paths. The system is implemented in the Gazebo simulator and integrates a sensor-fused vision-to-planning neuro-symbolic framework built with the Robot Operating System (ROS) middleware. This work highlights the future potential of integrating event-based vision with physics-guided planning for energy-efficient autonomous navigation, particularly for low-latency decision-making.
基于视觉的对象跟踪是实现自主空中导航的关键组件,特别是在障碍物避免方面。受生物视觉启发的神经形态动态视觉传感器(DVS)或事件相机提供了一种有希望替代传统帧基摄像头的选择。这些相机可以在具有高动态范围和抗运动模糊的能力下,在各种照明条件下异步检测强度变化。脉冲神经网络(SNNs)越来越多地被用来高效、异步处理基于事件的信号。同时,基于物理的人工智能提供了一种通过物理建模将系统级知识融入神经网络的方法。这种方法增强了鲁棒性,提高了能源效率,并提供了符号解释能力。在这项工作中,我们提出了一种用于自主无人机导航的神经形态导航框架。重点是检测和穿越移动门的同时避免碰撞。我们使用事件相机在无监督的情况下通过浅层SNN架构来检测移动物体,并结合一个轻量级、节能且由物理引导的神经网络(PgNN),该网络基于深度输入训练,以预测最优飞行时间,生成近似最小能耗路径。系统在Gazebo仿真器中实现,并集成了使用机器人操作系统(ROS)中间件构建的传感器融合视觉到规划神经符号框架。这项工作突显了将事件驱动视觉与物理引导规划相结合,为低延迟决策制定提供节能自主导航未来的潜力。
https://arxiv.org/abs/2502.05938
Drones or unmanned aerial vehicles are traditionally used for military missions, warfare, and espionage. However, the usage of drones has significantly increased due to multiple industrial applications involving security and inspection, transportation, research purposes, and recreational drone flying. Such an increased volume of drone activity in public spaces requires regulatory actions for purposes of privacy protection and safety. Hence, detection of illegal drone activities such as boundary encroachment becomes a necessity. Such detection tasks are usually automated and performed by deep learning models which are trained on annotated image datasets. This paper builds on a previous work and extends an already published open source dataset. A description and analysis of the entire dataset is provided. The dataset is used to train the YOLOv7 deep learning model and some of its minor variants and the results are provided. Since the detection models are based on a single image input, a simple cross-correlation based tracker is used to reduce detection drops and improve tracking performance in videos. Finally, the entire drone detection system is summarized.
无人机或无人驾驶飞行器传统上用于军事任务、战争和间谍活动。然而,由于涉及安全检查、运输、研究目的以及娱乐飞行等多种工业应用,无人机的使用量大幅增加。这种在公共场所中无人机活动量的增长要求采取监管措施以保护隐私并确保安全。因此,检测非法无人机活动(如边界侵犯)成为必要。此类检测任务通常由深度学习模型自动完成,这些模型是通过标注图像数据集进行训练的。 本文在此基础上展开,并扩展了一个已发布的开源数据集。文中提供了整个数据集的描述和分析。该数据集用于训练YOLOv7深度学习模型及其一些较小变体,并提供了结果展示。由于检测模型基于单张图片输入,因此使用了一种简单的基于交叉相关性的跟踪器来减少检测遗漏并提高视频中的追踪性能。最后,总结了整个无人机检测系统。 简而言之: - 该研究扩展了一个开源数据集用于训练YOLOv7及其变体。 - 数据集中包括描述和分析,以及模型训练的结果展示。 - 使用基于交叉相关性的简单跟踪器来优化视频中的表现。 - 提供了一个完整的无人机非法活动检测系统的总结。
https://arxiv.org/abs/2502.05292
Accurate people localisation using drones is crucial for effective crowd management, not only during massive events and public gatherings but also for monitoring daily urban crowd flow. Traditional methods for tiny object localisation using high-resolution drone imagery often face limitations in precision and efficiency, primarily due to constraints in image scaling and sliding window techniques. To address these challenges, a novel approach dedicated to point-oriented object localisation is proposed. Along with this approach, the Pixel Distill module is introduced to enhance the processing of high-definition images by extracting spatial information from individual pixels at once. Additionally, a new dataset named UP-COUNT, tailored to contemporary drone applications, is shared. It addresses a wide range of challenges in drone imagery, such as simultaneous camera and object movement during the image acquisition process, pushing forward the capabilities of crowd management applications. A comprehensive evaluation of the proposed method on the proposed dataset and the commonly used DroneCrowd dataset demonstrates the superiority of our approach over existing methods and highlights its efficacy in drone-based crowd object localisation tasks. These improvements markedly increase the algorithm's applicability to operate in real-world scenarios, enabling more reliable localisation and counting of individuals in dynamic environments.
使用无人机进行准确的人群定位对于有效的群体管理至关重要,不仅在大规模活动和公共集会期间,在日常城市人流监控中也同样重要。传统的高分辨率无人机图像微小目标定位方法通常面临精度和效率的限制,主要是由于图像缩放和平移窗口技术的局限性。为了解决这些挑战,提出了一种新的面向点状对象定位的方法。此外,还引入了Pixel Distill模块,通过一次性从单个像素中提取空间信息来增强高清图像处理能力。 同时,发布了一个名为UP-COUNT的新数据集,该数据集专为现代无人机应用设计,涵盖了无人机图像中的广泛挑战,例如在图像采集过程中相机和目标的同步移动。对提出的算法进行了一项全面评估,包括新发布的UP-COUNT数据集以及常用的DroneCrowd数据集,证明了我们方法相对于现有技术的优势,并突显了其在基于无人机的人群对象定位任务中的有效性。 这些改进显著提高了算法在实际场景中操作的应用性,使动态环境中个体的定位和计数更加可靠。
https://arxiv.org/abs/2502.04014
Multi-object tracking (MOT) in UAV-based video is challenging due to variations in viewpoint, low resolution, and the presence of small objects. While other research on MOT dedicated to aerial videos primarily focuses on the academic aspect by developing sophisticated algorithms, there is a lack of attention to the practical aspect of these systems. In this paper, we propose a novel real-time MOT framework that integrates Apache Kafka and Apache Spark for efficient and fault-tolerant video stream processing, along with state-of-the-art deep learning models YOLOv8/YOLOv10 and BYTETRACK/BoTSORT for accurate object detection and tracking. Our work highlights the importance of not only the advanced algorithms but also the integration of these methods with scalable and distributed systems. By leveraging these technologies, our system achieves a HOTA of 48.14 and a MOTA of 43.51 on the Visdrone2019-MOT test set while maintaining a real-time processing speed of 28 FPS on a single GPU. Our work demonstrates the potential of big data technologies and deep learning for addressing the challenges of MOT in UAV applications.
基于无人机的视频多目标跟踪(MOT)由于视角变化、低分辨率以及小物体的存在而具有挑战性。尽管其他专注于空中视频的MOT研究主要集中在通过开发复杂算法来解决学术问题上,但这些系统在实际应用方面的关注度却不足。在这篇论文中,我们提出了一种新颖的实时多目标跟踪框架,该框架整合了Apache Kafka和Apache Spark以实现高效且容错性强的视频流处理,并结合最先进的深度学习模型YOLOv8/YOLOv10及BYTETRACK/BoTSORT来实现准确的对象检测与追踪。我们的工作不仅强调高级算法的重要性,还突出了将这些方法与可扩展和分布式系统相结合的关键性。 通过利用这些技术,我们的系统在Visdrone2019-MOT测试集中实现了HOTA为48.14和MOTA为43.51的性能指标,并且在单个GPU上保持了每秒处理28帧(FPS)的实时处理速度。我们的工作展示了大数据技术和深度学习在解决无人机应用中多目标跟踪挑战方面的潜力。
https://arxiv.org/abs/2502.03760
Access to below-canopy volumetric vegetation data is crucial for understanding ecosystem dynamics. We address the long-standing limitation of remote sensing to penetrate deep into dense canopy layers. LiDAR and radar are currently considered the primary options for measuring 3D vegetation structures, while cameras can only extract the reflectance and depth of top layers. Using conventional, high-resolution aerial images, our approach allows sensing deep into self-occluding vegetation volumes, such as forests. It is similar in spirit to the imaging process of wide-field microscopy, but can handle much larger scales and strong occlusion. We scan focal stacks by synthetic-aperture imaging with drones and reduce out-of-focus signal contributions using pre-trained 3D convolutional neural networks. The resulting volumetric reflectance stacks contain low-frequency representations of the vegetation volume. Combining multiple reflectance stacks from various spectral channels provides insights into plant health, growth, and environmental conditions throughout the entire vegetation volume.
获取林冠下的植被体积数据对于理解生态系统动态至关重要。我们解决了遥感技术长期以来无法深入穿透密集树冠层的限制问题。目前,激光雷达(LiDAR)和雷达被视为测量三维植被结构的主要选项,而相机只能提取顶层的反射率和深度信息。通过使用传统的高分辨率航空图像,我们的方法允许对包括森林在内的自我遮挡植被体积进行深层感知,其原理类似于宽场显微成像过程,但可以处理更大规模的问题并克服严重的遮挡问题。 我们利用无人机合成孔径成像技术扫描焦平面堆栈,并采用预训练的三维卷积神经网络减少离焦信号的影响。所得的体积反射率堆栈包含了整个植被体积的低频表示信息。结合来自多个光谱通道的多个反射率堆栈,可以为植物健康、生长以及环境条件提供全面的理解和分析。
https://arxiv.org/abs/2502.02171
This paper introduces a learning-based visual planner for agile drone flight in cluttered environments. The proposed planner generates collision-free waypoints in milliseconds, enabling drones to perform agile maneuvers in complex environments without building separate perception, mapping, and planning modules. Learning-based methods, such as behavior cloning (BC) and reinforcement learning (RL), demonstrate promising performance in visual navigation but still face inherent limitations. BC is susceptible to compounding errors due to limited expert imitation, while RL struggles with reward function design and sample inefficiency. To address these limitations, this paper proposes an inverse reinforcement learning (IRL)-based framework for high-speed visual navigation. By leveraging IRL, it is possible to reduce the number of interactions with simulation environments and improve capability to deal with high-dimensional spaces while preserving the robustness of RL policies. A motion primitive-based path planning algorithm collects an expert dataset with privileged map data from diverse environments, ensuring comprehensive scenario coverage. By leveraging both the acquired expert and learner dataset gathered from the agent's interactions with the simulation environments, a robust reward function and policy are learned across diverse states. While the proposed method is trained in a simulation environment only, it can be directly applied to real-world scenarios without additional training or tuning. The performance of the proposed method is validated in both simulation and real-world environments, including forests and various structures. The trained policy achieves an average speed of 7 m/s and a maximum speed of 8.8 m/s in real flight experiments. To the best of our knowledge, this is the first work to successfully apply an IRL framework for high-speed visual navigation of drones.
本文介绍了一种基于学习的视觉规划器,用于在复杂环境中进行敏捷无人机飞行。该规划器能够在毫秒内生成无碰撞航路点,使无人机能够无需构建单独的感知、地图绘制和路径规划模块,在复杂的环境中执行敏捷操作。尽管基于行为克隆(BC)和强化学习(RL)的方法在视觉导航方面表现出令人鼓舞的性能,但它们仍然面临固有局限性:BC由于有限的专家模仿而容易累积误差;而RL则难以设计奖励函数并且样本效率低。为了克服这些局限性,本文提出了一种基于逆向强化学习(IRL)框架来进行高速视觉导航的方法。通过利用IRL,可以减少与仿真环境的交互次数,并提高处理高维空间的能力,同时保持RL策略的鲁棒性。 一种以运动原语为基础的路径规划算法收集了一个专家数据集,该数据集中包含了来自多样化环境的特权地图信息,确保全面覆盖各种场景。借助从代理与仿真环境互动中获得的专业和学习者数据集,可以针对多样化的状态学习出稳健的奖励函数和策略。尽管所提出的方法仅在模拟环境中进行训练,但可以直接应用于现实世界的应用场景,而无需额外训练或调优。 本文通过仿真和真实环境中的测试(包括森林和各种结构)验证了该方法的表现力。在实际飞行实验中,经过训练的策略达到了7米/秒的平均速度和8.8米/秒的最大速度。据我们所知,这是首次成功将IRL框架应用于无人机高速视觉导航的工作。
https://arxiv.org/abs/2502.02054
Multi-agent reinforcement learning (MARL) has made significant progress, largely fueled by the development of specialized testbeds that enable systematic evaluation of algorithms in controlled yet challenging scenarios. However, existing testbeds often focus on purely virtual simulations or limited robot morphologies such as robotic arms, quadrupeds, and humanoids, leaving high-mobility platforms with real-world physical constraints like drones underexplored. To bridge this gap, we present VolleyBots, a new MARL testbed where multiple drones cooperate and compete in the sport of volleyball under physical dynamics. VolleyBots features a turn-based interaction model under volleyball rules, a hierarchical decision-making process that combines motion control and strategic play, and a high-fidelity simulation for seamless sim-to-real transfer. We provide a comprehensive suite of tasks ranging from single-drone drills to multi-drone cooperative and competitive tasks, accompanied by baseline evaluations of representative MARL and game-theoretic algorithms. Results in simulation show that while existing algorithms handle simple tasks effectively, they encounter difficulty in complex tasks that require both low-level control and high-level strategy. We further demonstrate zero-shot deployment of a simulation-learned policy to real-world drones, highlighting VolleyBots' potential to propel MARL research involving agile robotic platforms. The project page is at this https URL.
多智能体强化学习(MARL)取得了显著进展,这主要得益于开发了一系列专业测试平台,这些平台能够对算法进行系统评估,在控制但具有挑战性的场景中。然而,现有的大多数测试床往往专注于纯粹的虚拟仿真或有限的机器人形态,如机械臂、四足动物和人形机器人,而忽视了带有真实世界物理限制的高机动性平台(例如无人机)的研究。 为了弥补这一差距,我们提出了VolleyBots,这是一个新的MARL测试平台,在该平台上,多架无人机可以在受控动力学环境中合作并竞争打排球。VolleyBots采用基于排球规则的回合制互动模式,并结合了运动控制和策略性游戏决策的分层决策过程。此外,它还具有高保真模拟功能,以实现从仿真到现实无缝转换。 我们提供了包括单一无人机训练在内的各种任务以及多架无人机之间的合作与竞争任务,附带了代表性的MARL和博弈论算法的基准评估结果。在仿真实验中,结果显示现有算法可以在简单任务上表现良好,但在需要低级控制与高级策略相结合的复杂任务中则遇到了困难。 我们进一步展示了从仿真环境中学习到的政策可以直接部署到实际无人机上,并强调了VolleyBots平台对于涉及敏捷机器人平台的MARL研究的巨大潜力。项目页面位于此处(URL略)。
https://arxiv.org/abs/2502.01932
Digital agriculture technologies rely on sensors, drones, robots, and autonomous farm equipment to improve farm yields and incorporate sustainability practices. However, the adoption of such technologies is severely limited by the lack of broadband connectivity in rural areas. We argue that farming applications do not require permanent always-on connectivity. Instead, farming activity and digital agriculture applications follow seasonal rhythms of agriculture. Therefore, the need for connectivity is highly localized in time and space. We introduce BYON, a new connectivity model for high bandwidth agricultural applications that relies on emerging connectivity solutions like citizens broadband radio service (CBRS) and satellite networks. BYON creates an agile connectivity solution that can be moved along a farm to create spatio-temporal connectivity bubbles. BYON incorporates a new gateway design that reacts to the presence of crops and optimizes coverage in agricultural settings. We evaluate BYON in a production farm and demonstrate its benefits.
数字农业技术依赖于传感器、无人机、机器人和自主农机设备来提高农场产量并融入可持续实践。然而,由于农村地区缺乏宽带连接,这些技术的采用受到严重限制。我们认为,农业应用并不需要永久性的在线连接。相反,农业生产活动和数字农业应用程序遵循农业季节性节奏。因此,对连接的需求在时间和空间上都是高度局部化的。我们提出了BYON(Bring Your Own Network),这是一种新的高带宽农业应用连接模型,依赖于新兴的连接解决方案如公民宽带无线电服务(CBRS)和卫星网络。BYON创建了一种可以在农场内移动的灵活连接方案,从而在特定的时间和地点形成连接气泡。BYON还引入了新的网关设计,该设计能够根据作物的存在情况做出反应并优化农业环境中的覆盖范围。我们在一个生产农场中评估了BYON,并展示了其带来的益处。
https://arxiv.org/abs/2502.01478
The design of multicopter drones has remained almost the same since its inception. While conventional designs, such as the quadcopter, work well in many cases, they may not be optimal in specific environments or missions. This paper revisits rotary drone design by exploring which body morphologies are optimal for different objectives and constraints. Specifically, an evolutionary algorithm is used to produce optimal drone morphologies for three objectives: (1) high thrust-to-weight ratio, (2) high maneuverability, and (3) small size. To generate a range of optimal drones with performance trade-offs between them, the non-dominated sorting genetic algorithm II, or NSGA-II is used. A randomly sampled population of 600 is evolved over 2000 generations. The NSGA-II algorithm evolved drone bodies that outperform a standard 5-inch 220 mm wheelbase quadcopter in at least one of the three objectives. The three extrema in the Pareto front show improvement of 487.8%, 23.5% and 4.8% in maneuverability, thrust-to-weight ratio and size, respectively. The improvement in maneuverability can be attributed to the tilt angles of the propellers, while the increase in thrust-to-weight ratio is primarily due to the higher number of propellers. The quadcopter is located on the Pareto front for the three objectives. However, our results also show that other designs can be better depending on the objectives.
自从多旋翼无人机诞生以来,其设计几乎保持不变。尽管传统的设计如四轴飞行器在许多情况下表现良好,但在特定环境或任务中可能并非最优选择。本文通过探索不同目标和约束条件下的最佳机身形态,重新审视了旋转无人机的设计。具体来说,使用了一种进化算法来为三个目标生成最优化的无人机形态:(1)高推重比;(2)高机动性;(3)小尺寸。 为了在这些性能之间产生一系列具有权衡效果的最优无人机,采用了非支配排序遗传算法II(NSGA-II)。从一个包含600个随机样本的初始群体出发,在经过2000代演化后,所生成的NSGA-II算法演化的无人机机身体现出至少一项优于标准5英寸220毫米轴距四轴飞行器的表现。帕累托前沿的三个极端值分别在机动性、推重比和尺寸方面提升了487.8%、23.5% 和 4.8%。 机动性的提升主要归因于旋翼的倾斜角度,而推重比的提高则主要是由于更多旋翼的应用。四轴飞行器位于这三个目标的帕累托前沿上。然而,我们的结果也表明,在某些情况下,其他设计可能会更好,这取决于具体的目标。
https://arxiv.org/abs/2502.01197
Drones have become prevalent robotic platforms with diverse applications, showing significant potential in Embodied Artificial Intelligence (Embodied AI). Referring Expression Comprehension (REC) enables drones to locate objects based on natural language expressions, a crucial capability for Embodied AI. Despite advances in REC for ground-level scenes, aerial views introduce unique challenges including varying viewpoints, occlusions and scale variations. To address this gap, we introduce RefDrone, a REC benchmark for drone scenes. RefDrone reveals three key challenges in REC: 1) multi-scale and small-scale target detection; 2) multi-target and no-target samples; 3) complex environment with rich contextual expressions. To efficiently construct this dataset, we develop RDAgent (referring drone annotation framework with multi-agent system), a semi-automated annotation tool for REC tasks. RDAgent ensures high-quality contextual expressions and reduces annotation cost. Furthermore, we propose Number GroundingDINO (NGDINO), a novel method designed to handle multi-target and no-target cases. NGDINO explicitly learns and utilizes the number of objects referred to in the expression. Comprehensive experiments with state-of-the-art REC methods demonstrate that NGDINO achieves superior performance on both the proposed RefDrone and the existing gRefCOCO datasets. The dataset and code will be publicly at this https URL.
无人机已经成为具有多种应用的流行机器人平台,展示了在具身人工智能(Embodied AI)中的巨大潜力。参照表达理解(Refer Expression Comprehension,简称REC)使无人机能够根据自然语言描述找到物体,这对于实现具身AI至关重要。尽管地面上场景中REC技术已取得进展,但高空视角却带来了新的挑战,包括变化的观察角度、遮挡以及尺度的变化等。为了解决这一缺口,我们引入了RefDrone,这是一个专为无人机场景设计的REC基准测试。 RefDrone揭示了三个在REC中的关键挑战: 1) 多尺度和小目标检测; 2) 包含多目标和无目标样本的数据点; 3) 具有丰富背景描述的复杂环境。 为了高效地构建此数据集,我们开发了一种名为RDAgent(基于多智能体系统的无人机参照标注框架)的半自动化REC任务注释工具。RDAgent能够确保高质量的上下文表达,并减少了标记成本。此外,我们还提出了一种新方法Number Grounding DINO (NGDINO),该方法专门针对处理多目标和无目标情况而设计。NGDINO明确学习并利用参考语句中提到的对象数量。 使用最先进的REC方法进行的全面实验表明,在提出的RefDrone数据集以及现有的gRefCOCO数据集中,NGDINO均取得了更好的性能表现。此数据集及代码将在以下网址公开提供:[https URL](原文如此,建议访问相关页面获取准确链接)。
https://arxiv.org/abs/2502.00392
Railroad bridges are a crucial component of the U.S. freight rail system, which moves over 40 percent of the nation's freight and plays a critical role in the economy. However, aging bridge infrastructure and increasing train traffic pose significant safety hazards and risk service disruptions. The U.S. rail network includes over 100,000 railroad bridges, averaging one every 1.4 miles of track, with steel bridges comprising over 50% of the network's total bridge length. Early identification and assessment of damage in these bridges remain challenging tasks. This study proposes a physics-informed neural network (PINN) based approach for damage identification in steel truss railroad bridges. The proposed approach employs an unsupervised learning approach, eliminating the need for large datasets typically required by supervised methods. The approach utilizes train wheel load data and bridge response during train crossing events as inputs for damage identification. The PINN model explicitly incorporates the governing differential equations of the linear time-varying (LTV) bridge-train system. Herein, this model employs a recurrent neural network (RNN) based architecture incorporating a custom Runge-Kutta (RK) integrator cell, designed for gradient-based learning. The proposed approach updates the bridge finite element model while also quantifying damage severity and localizing the affected structural members. A case study on the Calumet Bridge in Chicago, Illinois, with simulated damage scenarios, is used to demonstrate the model's effectiveness in identifying damage while maintaining low false-positive rates. Furthermore, the damage identification pipeline is designed to seamlessly integrate prior knowledge from inspections and drone surveys, also enabling context-aware updating and assessment of bridge's condition.
铁路桥梁是美国货运铁路系统的关键组成部分,该系统运输了全国约40%的货物,并在经济中发挥着至关重要的作用。然而,老化基础设施和日益增长的列车流量带来了重大的安全风险和服务中断的风险。美国铁路网络包括超过10万座铁路桥,平均每1.4英里轨道一座,其中钢制桥梁占整个网络总桥梁长度的50%以上。这些桥梁中早期损坏的识别与评估依然是一个具有挑战性的任务。 本研究提出了一种基于物理信息神经网络(Physics-Informed Neural Network, PINN)的方法,用于识别钢铁桁架铁路桥中的损伤。该方法采用无监督学习的方式,从而不需要大量数据集,这是有监督方法通常所需的。此方法使用列车轮载荷数据和列车过桥时的桥梁响应作为输入来识别损伤。PINN模型明确地将线性时变(Linear Time-Varying, LTV)桥梁-列车系统的支配微分方程纳入其中。该模型采用了一种基于递归神经网络(Recurrent Neural Network, RNN)架构,结合了定制的龙格库塔(Runge-Kutta, RK)积分器单元,以支持梯度学习。所提出的方法在更新桥梁有限元模型的同时,还能够量化损伤程度并定位受影响的结构部件。 通过位于伊利诺伊州芝加哥市带有模拟损坏场景的卡卢梅特桥进行案例研究,展示该模型在识别损伤时保持低误报率的有效性。此外,损伤识别流程设计为可以无缝整合来自检查和无人机调查的先验知识,并能够根据上下文智能更新和评估桥梁状况。
https://arxiv.org/abs/2502.00194
The integration of human-intuitive interactions into autonomous systems has been limited. Traditional Natural Language Processing (NLP) systems struggle with context and intent understanding, severely restricting human-robot interaction. Recent advancements in Large Language Models (LLMs) have transformed this dynamic, allowing for intuitive and high-level communication through speech and text, and bridging the gap between human commands and robotic actions. Additionally, autonomous navigation has emerged as a central focus in robotics research, with artificial intelligence (AI) increasingly being leveraged to enhance these systems. However, existing AI-based navigation algorithms face significant challenges in latency-critical tasks where rapid decision-making is critical. Traditional frame-based vision systems, while effective for high-level decision-making, suffer from high energy consumption and latency, limiting their applicability in real-time scenarios. Neuromorphic vision systems, combining event-based cameras and spiking neural networks (SNNs), offer a promising alternative by enabling energy-efficient, low-latency navigation. Despite their potential, real-world implementations of these systems, particularly on physical platforms such as drones, remain scarce. In this work, we present Neuro-LIFT, a real-time neuromorphic navigation framework implemented on a Parrot Bebop2 quadrotor. Leveraging an LLM for natural language processing, Neuro-LIFT translates human speech into high-level planning commands which are then autonomously executed using event-based neuromorphic vision and physics-driven planning. Our framework demonstrates its capabilities in navigating in a dynamic environment, avoiding obstacles, and adapting to human instructions in real-time.
将人类直观的交互融入自主系统中一直受到限制。传统的自然语言处理(NLP)系统在理解和上下文意图方面存在困难,这严重制约了人机互动。近年来,大规模语言模型(LLMs)的进步改变了这一局面,使得通过语音和文本实现直观且高层次的交流成为可能,并缩小了人类指令与机器人行动之间的差距。此外,在机器人研究中,自主导航已经成为一个核心焦点,人工智能(AI)被越来越多地用来增强这些系统。然而,现有的基于AI的导航算法在需要快速决策的任务中面临重大挑战,尤其是在延迟敏感任务中表现不佳。传统的帧基视觉系统虽然适用于高层次决策制定,但由于能耗高和延迟大,限制了其在实时场景中的应用。神经形态视觉系统结合事件驱动相机与脉冲神经网络(SNNs),通过提供高效的低延时导航方式为解决这些问题提供了有前景的替代方案。尽管这些系统的潜力巨大,在诸如无人机等物理平台上的实际实施仍然很少见。 本文介绍了一种名为Neuro-LIFT的实时神经形态导航框架,该框架被部署在Parrot Bebop2四旋翼飞行器上。通过利用大规模语言模型进行自然语言处理,Neuro-LIFT能够将人类的语音转化为高层次规划命令,并借助事件驱动型神经形态视觉和基于物理规律的规划实现这些指令的自主执行。我们的框架展示了一项能力:在动态环境中导航、避开障碍物以及实时适应人类指令。
https://arxiv.org/abs/2501.19259
This report provides an overview of the workshop titled Autonomy and Safety Assurance in the Early Development of Robotics and Autonomous Systems, hosted by the Centre for Robotic Autonomy in Demanding and Long-Lasting Environments (CRADLE) on September 2, 2024, at The University of Manchester, UK. The event brought together representatives from six regulatory and assurance bodies across diverse sectors to discuss challenges and evidence for ensuring the safety of autonomous and robotic systems, particularly autonomous inspection robots (AIR). The workshop featured six invited talks by the regulatory and assurance bodies. CRADLE aims to make assurance an integral part of engineering reliable, transparent, and trustworthy autonomous systems. Key discussions revolved around three research questions: (i) challenges in assuring safety for AIR; (ii) evidence for safety assurance; and (iii) how assurance cases need to differ for autonomous systems. Following the invited talks, the breakout groups further discussed the research questions using case studies from ground (rail), nuclear, underwater, and drone-based AIR. This workshop offered a valuable opportunity for representatives from industry, academia, and regulatory bodies to discuss challenges related to assured autonomy. Feedback from participants indicated a strong willingness to adopt a design-for-assurance process to ensure that robots are developed and verified to meet regulatory expectations.
该报告概述了由《苛刻和长时间运行环境中的机器人自主性中心》(CRADLE)于2024年9月2日在英国曼彻斯特大学举办的研讨会“机器人与自主系统早期开发中的自主性和安全性保证”。此次会议汇聚了来自六个不同行业的监管和保障机构的代表,共同探讨确保自主及机器人系统的安全性的挑战及证据,特别是针对自主检查机器人(AIR)的安全性。研讨会包括六场由监管和保障机构邀请进行的主题演讲。 CRADLE旨在使保障成为开发可靠、透明且可信的自主系统工程中的核心部分。研讨会上的主要讨论围绕三个研究问题展开:(i) 确保自主检查机器人的安全性所面临的挑战;(ii) 用于安全性的证据;以及 (iii) 自主系统的保证案例需要如何区别对待。 在邀请演讲之后,各小组就上述研究问题进行了进一步的探讨,并以地面(铁路)、核能、水下及无人机领域的AIR案例研究为基础进行讨论。此次研讨会为来自业界、学术界和监管机构的代表提供了宝贵的机会,以便共同探讨确保自主性所需面对的问题。参会者反馈表明,他们愿意采纳旨在满足监管期望的设计保障流程,从而确保机器人的开发与验证工作得以顺利完成。
https://arxiv.org/abs/2501.18448
This paper presents a novel hybrid approach to solving real-world drone routing problems by leveraging the capabilities of quantum computing. The proposed method, coined Quantum for Drone Routing (Q4DR), integrates the two most prominent paradigms in the field: quantum gate-based computing, through the Eclipse Qrisp programming language; and quantum annealers, by means of D-Wave System's devices. The algorithm is divided into two different phases: an initial clustering phase executed using a Quantum Approximate Optimization Algorithm (QAOA), and a routing phase employing quantum annealers. The efficacy of Q4DR is demonstrated through three use cases of increasing complexity, each incorporating real-world constraints such as asymmetric costs, forbidden paths, and itinerant charging points. This research contributes to the growing body of work in quantum optimization, showcasing the practical applications of quantum computing in logistics and route planning.
本文提出了一种新颖的混合方法,利用量子计算的能力解决现实世界的无人机路径规划问题。该方法被命名为“Quantum for Drone Routing”(Q4DR),它结合了领域中两个最突出的范式:通过Eclipse Qrisp编程语言实现基于门控的量子计算;以及通过D-Wave系统的设备实现量子退火器。算法分为两个不同的阶段:初始聚类阶段使用量子近似优化算法(Quantum Approximate Optimization Algorithm,QAOA)执行;路径规划阶段则采用量子退火器。 该研究通过三个复杂度递增的用例展示了Q4DR的有效性,每个用例都包含了现实世界中的约束条件,例如非对称成本、禁止通行路线和移动充电点。这项研究为不断增长的量子优化文献做出了贡献,并展示了量子计算在物流和路径规划中的实际应用价值。
https://arxiv.org/abs/2501.18432
Autonomous aerial monitoring is an important task aimed at gathering information from areas that may not be easily accessible by humans. At the same time, this task often requires recognizing anomalies from a significant distance or not previously encountered in the past. In this paper, we propose a novel framework that leverages the advanced capabilities provided by Large Language Models (LLMs) to actively collect information and perform anomaly detection in novel scenes. To this end, we propose an LLM based model dialogue approach, in which two deep learning models engage in a dialogue to actively control a drone to increase perception and anomaly detection accuracy. We conduct our experiments in a high fidelity simulation environment where an LLM is provided with a predetermined set of natural language movement commands mapped into executable code functions. Additionally, we deploy a multimodal Visual Question Answering (VQA) model charged with the task of visual question answering and captioning. By engaging the two models in conversation, the LLM asks exploratory questions while simultaneously flying a drone into different parts of the scene, providing a novel way to implement active perception. By leveraging LLMs reasoning ability, we output an improved detailed description of the scene going beyond existing static perception approaches. In addition to information gathering, our approach is utilized for anomaly detection and our results demonstrate the proposed methods effectiveness in informing and alerting about potential hazards.
自主空中监测是一项重要任务,旨在收集人类难以轻易进入的区域的信息。同时,这项任务通常需要在较远距离或以前未遇到的情况下识别异常情况。本文提出了一种新型框架,利用大型语言模型(LLMs)的高级功能来主动收集信息并执行新颖场景中的异常检测。 为此,我们提出了一个基于LLM的对话式建模方法,在这种方法中,两个深度学习模型进行对话,以积极地控制无人机以提高感知和异常检测的准确性。我们在高保真模拟环境中进行了实验,在这种环境下,预先设定的一组自然语言移动命令被映射为可执行代码函数,并提供给LLM使用。此外,我们部署了一个多模态视觉问答(VQA)模型,用于承担视觉问题回答和描述的任务。 通过让两个模型进行对话,LLM在同时飞行无人机进入场景的不同部分时提出探索性问题,从而实现了一种新的主动感知实施方式。利用LLMs的推理能力,我们输出了比现有静态感知方法更详细的场景描述,并超越它们。除了信息收集之外,我们的方法还用于异常检测,实验结果表明该方法在告知和警示潜在危险方面非常有效。
https://arxiv.org/abs/2501.16300