This paper presents a novel autonomous drone-based smoke plume tracking system capable of navigating and tracking plumes in highly unsteady atmospheric conditions. The system integrates advanced hardware and software and a comprehensive simulation environment to ensure robust performance in controlled and real-world settings. The quadrotor, equipped with a high-resolution imaging system and an advanced onboard computing unit, performs precise maneuvers while accurately detecting and tracking dynamic smoke plumes under fluctuating conditions. Our software implements a two-phase flight operation, i.e., descending into the smoke plume upon detection and continuously monitoring the smoke movement during in-plume tracking. Leveraging Proportional Integral-Derivative (PID) control and a Proximal Policy Optimization based Deep Reinforcement Learning (DRL) controller enables adaptation to plume dynamics. Unreal Engine simulation evaluates performance under various smoke-wind scenarios, from steady flow to complex, unsteady fluctuations, showing that while the PID controller performs adequately in simpler scenarios, the DRL-based controller excels in more challenging environments. Field tests corroborate these findings. This system opens new possibilities for drone-based monitoring in areas like wildfire management and air quality assessment. The successful integration of DRL for real-time decision-making advances autonomous drone control for dynamic environments.
本文介绍了一种新颖的自主无人机烟羽追踪系统,该系统能够在高度不稳定的大气条件下导航和跟踪烟羽。该系统集成了先进的硬件与软件,并且包括一个全面的模拟环境,以确保在控制和现实世界设置中均能实现稳健性能。四旋翼飞行器配备了高分辨率成像系统和高级机载计算单元,在不断变化的情况下能够执行精确操作并准确地检测和跟踪动态烟羽。 我们的软件实施了两个阶段的飞行操作:即探测到烟羽后下降进入烟羽,并在进入烟羽后持续监测烟雾运动。通过利用比例积分微分(PID)控制以及基于近端策略优化(Proximal Policy Optimization)的深度强化学习(DRL)控制器,使系统能够适应烟羽动态变化。 借助Unreal Engine模拟器,在各种烟雾-风环境场景下评估了系统的性能,从稳定的气流到复杂的不稳定性波动。结果显示:虽然PID控制器在简单情况下表现良好,但基于DRL的控制器在更复杂和具有挑战性的环境中表现出色。实地测试验证了这些发现。 该系统为无人机监测开辟了新的可能性,特别是在野火管理和空气质量评估等领域。将深度强化学习成功集成到实时决策制定中,有助于自主无人机控制在动态环境中的发展与应用。
https://arxiv.org/abs/2504.12664
With the rapid development of information technology, modern warfare increasingly relies on intelligence, making small target detection critical in military applications. The growing demand for efficient, real-time detection has created challenges in identifying small targets in complex environments due to interference. To address this, we propose a small target detection method based on multi-modal image fusion and attention mechanisms. This method leverages YOLOv5, integrating infrared and visible light data along with a convolutional attention module to enhance detection performance. The process begins with multi-modal dataset registration using feature point matching, ensuring accurate network training. By combining infrared and visible light features with attention mechanisms, the model improves detection accuracy and robustness. Experimental results on anti-UAV and Visdrone datasets demonstrate the effectiveness and practicality of our approach, achieving superior detection results for small and dim targets.
随着信息技术的快速发展,现代战争越来越依赖于情报,使得小目标检测在军事应用中变得至关重要。对高效实时检测的需求日益增长,在复杂环境中识别小目标面临着因干扰而导致的挑战。为此,我们提出了一种基于多模态图像融合和注意力机制的小目标检测方法。该方法利用YOLOv5框架,并结合红外与可见光数据以及卷积注意模块来提升检测性能。过程始于通过特征点匹配进行多模态数据集注册,确保网络训练的准确性。通过将红外和可见光特征与注意力机制相结合,模型能够提高对小目标的检测准确性和鲁棒性。在反无人机和Visdrone数据集上的实验结果证明了我们方法的有效性和实用性,在检测小型和暗淡目标方面取得了卓越的结果。
https://arxiv.org/abs/2504.11262
Reliable traffic data are essential for understanding urban mobility and developing effective traffic management strategies. This study introduces the DRone-derived Intelligence For Traffic analysis (DRIFT) dataset, a large-scale urban traffic dataset collected systematically from synchronized drone videos at approximately 250 meters altitude, covering nine interconnected intersections in Daejeon, South Korea. DRIFT provides high-resolution vehicle trajectories that include directional information, processed through video synchronization and orthomap alignment, resulting in a comprehensive dataset of 81,699 vehicle trajectories. Through our DRIFT dataset, researchers can simultaneously analyze traffic at multiple scales - from individual vehicle maneuvers like lane-changes and safety metrics such as time-to-collision to aggregate network flow dynamics across interconnected urban intersections. The DRIFT dataset is structured to enable immediate use without additional preprocessing, complemented by open-source models for object detection and trajectory extraction, as well as associated analytical tools. DRIFT is expected to significantly contribute to academic research and practical applications, such as traffic flow analysis and simulation studies. The dataset and related resources are publicly accessible at this https URL.
可靠的交通数据对于理解城市流动性并制定有效的交通管理策略至关重要。本研究介绍了DRIFT(由无人机采集的用于交通分析的情报)数据集,这是一个大规模的城市交通数据集,系统地从同步的无人机视频中收集而来,高度约为250米,涵盖了韩国大田九个相互连接的交叉路口。DRIFT提供了包括方向信息在内的高分辨率车辆轨迹,并通过视频同步和正射图对齐进行了处理,最终形成了包含81,699条车辆轨迹的全面数据集。借助我们的DRIFT数据集,研究人员可以同时在多个尺度上分析交通情况——从个别车辆操作如车道变换到安全指标如碰撞时间以及相互连接的城市交叉路口的整体网络流量动态。 该DRIFT数据集结构化设计便于直接使用,无需额外预处理,并配有开源模型用于对象检测和轨迹提取,以及相关分析工具。预计DRIFT将显著促进学术研究和实际应用,例如交通流分析和模拟研究。该数据集及其相关资源可在以下网址公开获取:[此处插入网址]。
https://arxiv.org/abs/2504.11019
Adversarial patches are widely used to evaluate the robustness of object detection systems in real-world scenarios. These patches were initially designed to deceive single-modal detectors (e.g., visible or infrared) and have recently been extended to target visible-infrared dual-modal detectors. However, existing dual-modal adversarial patch attacks have limited attack effectiveness across diverse physical scenarios. To address this, we propose CDUPatch, a universal cross-modal patch attack against visible-infrared object detectors across scales, views, and scenarios. Specifically, we observe that color variations lead to different levels of thermal absorption, resulting in temperature differences in infrared imaging. Leveraging this property, we propose an RGB-to-infrared adapter that maps RGB patches to infrared patches, enabling unified optimization of cross-modal patches. By learning an optimal color distribution on the adversarial patch, we can manipulate its thermal response and generate an adversarial infrared texture. Additionally, we introduce a multi-scale clipping strategy and construct a new visible-infrared dataset, MSDrone, which contains aerial vehicle images in varying scales and perspectives. These data augmentation strategies enhance the robustness of our patch in real-world conditions. Experiments on four benchmark datasets (e.g., DroneVehicle, LLVIP, VisDrone, MSDrone) show that our method outperforms existing patch attacks in the digital domain. Extensive physical tests further confirm strong transferability across scales, views, and scenarios.
对抗性贴纸被广泛用于评估现实场景中物体检测系统的鲁棒性。这些贴纸最初设计用来欺骗单模态探测器(例如可见光或红外),最近已扩展以针对可见光-红外双模态探测器。然而,现有的双模态对抗性贴纸攻击在不同物理场景中的攻击效果有限。为了解决这个问题,我们提出了CDUPatch,一种跨尺度、视角和场景的通用跨模态贴纸攻击方法,专门针对可见光-红外物体检测器。 具体来说,我们观察到颜色变化会导致不同的热吸收水平,从而导致红外成像中的温度差异。利用这一特性,我们提出了一种RGB至红外适配器,将RGB贴纸映射为红外贴纸,从而使跨模态贴纸的统一优化成为可能。通过在对抗性贴纸上学习最佳的颜色分布,我们可以操纵其热响应并生成对抗性的红外纹理。 此外,我们引入了多尺度裁剪策略,并构建了一个新的可见光-红外数据集MSDrone,其中包含不同尺寸和视角下的空中车辆图像。这些数据增强策略提高了我们的贴纸在现实世界条件下的鲁棒性。 在四个基准数据集(例如DroneVehicle、LLVIP、VisDrone、MSDrone)上的实验表明,我们提出的方法在数字领域中优于现有的贴纸攻击方法。广泛的物理测试进一步证实了跨尺度、视角和场景的强大的迁移能力。
https://arxiv.org/abs/2504.10888
This work quantitatively evaluates the performance of event-based vision systems (EVS) against conventional RGB-based models for action prediction in collision avoidance on an FPGA accelerator. Our experiments demonstrate that the EVS model achieves a significantly higher effective frame rate (1 kHz) and lower temporal (-20 ms) and spatial prediction errors (-20 mm) compared to the RGB-based model, particularly when tested on out-of-distribution data. The EVS model also exhibits superior robustness in selecting optimal evasion maneuvers. In particular, in distinguishing between movement and stationary states, it achieves a 59 percentage point advantage in precision (78% vs. 19%) and a substantially higher F1 score (0.73 vs. 0.06), highlighting the susceptibility of the RGB model to overfitting. Further analysis in different combinations of spatial classes confirms the consistent performance of the EVS model in both test data sets. Finally, we evaluated the system end-to-end and achieved a latency of approximately 2.14 ms, with event aggregation (1 ms) and inference on the processing unit (0.94 ms) accounting for the largest components. These results underscore the advantages of event-based vision for real-time collision avoidance and demonstrate its potential for deployment in resource-constrained environments.
这项工作定量评估了基于事件的视觉系统(EVS)在碰撞避免中的动作预测性能,与传统的RGB模型相比,在FPGA加速器上进行测试。我们的实验表明,EVS模型实现了显著更高的有效帧率(1 kHz),并且在时间和空间预测误差方面更低(时间上低20毫秒,空间上低20毫米),特别是在处理分布外数据时优于RGB模型。此外,EVS模型在选择最优规避动作的鲁棒性上也表现出色。尤其值得一提的是,在区分运动状态和静止状态方面,EVS模型在精度上比RGB模型高出59个百分点(78%对比19%),并且F1分数显著更高(0.73对0.06),这表明RGB模型容易过度拟合。不同空间类别的组合分析进一步证实了EVS模型在测试数据集中的稳定性能。最后,我们进行了端到端系统评估,并实现了约2.14毫秒的延迟,其中事件聚合(1毫秒)和处理单元上的推理时间(0.94毫秒)是主要组成部分。这些结果突显了基于事件的视觉技术在实时碰撞避免中的优势,并展示了其在资源受限环境下的部署潜力。
https://arxiv.org/abs/2504.10400
Live tracking of wildlife via high-resolution video processing directly onboard drones is widely unexplored and most existing solutions rely on streaming video to ground stations to support navigation. Yet, both autonomous animal-reactive flight control beyond visual line of sight and/or mission-specific individual and behaviour recognition tasks rely to some degree on this capability. In response, we introduce WildLive -- a near real-time animal detection and tracking framework for high-resolution imagery running directly onboard uncrewed aerial vehicles (UAVs). The system performs multi-animal detection and tracking at 17fps+ for HD and 7fps+ on 4K video streams suitable for operation during higher altitude flights to minimise animal disturbance. Our system is optimised for Jetson Orin AGX onboard hardware. It integrates the efficiency of sparse optical flow tracking and mission-specific sampling with device-optimised and proven YOLO-driven object detection and segmentation techniques. Essentially, computational resource is focused onto spatio-temporal regions of high uncertainty to significantly improve UAV processing speeds without domain-specific loss of accuracy. Alongside, we introduce our WildLive dataset, which comprises 200k+ annotated animal instances across 19k+ frames from 4K UAV videos collected at the Ol Pejeta Conservancy in Kenya. All frames contain ground truth bounding boxes, segmentation masks, as well as individual tracklets and tracking point trajectories. We compare our system against current object tracking approaches including OC-SORT, ByteTrack, and SORT. Our multi-animal tracking experiments with onboard hardware confirm that near real-time high-resolution wildlife tracking is possible on UAVs whilst maintaining high accuracy levels as needed for future navigational and mission-specific animal-centric operational autonomy.
通过高分辨率视频处理在无人机上实时追踪野生动物的方法目前还很少被探索,而大多数现有的解决方案依赖于将视频流传输到地面站来支持导航。然而,无论是自主的动物反应式飞行控制(超越视线范围)还是特定任务下的个体和行为识别任务,在某种程度上都依赖于此能力。为此,我们介绍了WildLive——这是一个近实时野生动物检测与追踪框架,适用于在无人驾驶航空器(UAV)上直接运行高分辨率图像处理。 该系统能够在高清视频流中实现多动物检测和追踪速度超过17帧/秒,并且在4K视频流中的速度超过7帧/秒。这些性能对于高空飞行操作而言非常合适,能够尽量减少对野生动物的干扰。我们的系统优化了Jetson Orin AGX车载硬件,并结合稀疏光流跟踪、任务特定采样与设备优化和经过验证的YOLO驱动对象检测及分割技术的有效性。简言之,计算资源集中在时空区域中的高不确定性部分上,从而显著提升无人机处理速度,同时保持对领域的准确度不降低。 此外,我们还推出了WildLive数据集,该数据集包括超过20万个标注动物实例,这些实例来自在肯尼亚奥尔佩杰塔保护区收集的4K无人机视频中的19,000多帧。所有帧都包含地面真值边界框、分割掩码以及个体跟踪轨迹和跟踪点路径。 我们还将WildLive系统与现有对象追踪方法(如OC-SORT,ByteTrack 和SORT)进行了比较。通过车载硬件进行的多动物追踪实验表明,在无人驾驶航空器上实现近实时高分辨率野生动物追踪是可能的,并且在所需的未来导航及特定任务下的动物中心操作自主性方面仍能保持高水平的准确度。
https://arxiv.org/abs/2504.10165
Drones, like most airborne aerial vehicles, face inherent disadvantages in achieving agile flight due to their limited thrust capabilities. These physical constraints cannot be fully addressed through advancements in control algorithms alone. Drawing inspiration from the winged flying squirrel, this paper proposes a highly maneuverable drone equipped with agility-enhancing foldable wings. By leveraging collaborative control between the conventional propeller system and the foldable wings-coordinated through the Thrust-Wing Coordination Control (TWCC) framework-the controllable acceleration set is expanded, enabling the generation of abrupt vertical forces that are unachievable with traditional wingless drones. The complex aerodynamics of the foldable wings are modeled using a physics-assisted recurrent neural network (paRNN), which calibrates the angle of attack (AOA) to align with the real aerodynamic behavior of the wings. The additional air resistance generated by appropriately deploying these wings significantly improves the tracking performance of the proposed "flying squirrel" drone. The model is trained on real flight data and incorporates flat-plate aerodynamic principles. Experimental results demonstrate that the proposed flying squirrel drone achieves a 13.1% improvement in tracking performance, as measured by root mean square error (RMSE), compared to a conventional wingless drone. A demonstration video is available on YouTube: this https URL.
无人机,像大多数空中飞行器一样,在实现敏捷飞行方面面临着因推力能力有限而带来的固有劣势。仅靠控制算法的进步无法完全解决这些物理限制问题。受飞鼠这种生物的启发,本文提出了一种装备了可折叠翼以增强机动性的高灵活性无人机。通过传统螺旋桨系统与可折叠翅膀之间的协作控制(通过推力-机翼协调控制系统(TWCC)框架来实现),可以扩展可控加速度集,并能够生成传统的无翼无人机无法产生的突然垂直力。 该研究利用一种基于物理辅助的递归神经网络(paRNN),对这些折叠翼复杂的空气动力学特性进行建模,以校准攻角(AOA),使之与机翼的实际气动行为相匹配。通过适当地展开这些翅膀来生成额外的空气阻力,显著提高了所提议的“飞鼠”无人机的跟踪性能。 该模型是在真实飞行数据上训练而成,并且融入了平板空气动力学原理。实验结果表明,相对于传统的无翼无人机,“飞鼠”无人机在以均方根误差(RMSE)衡量的跟踪性能方面提高了13.1%。 相关演示视频可在YouTube观看:[此链接](this https URL)。
https://arxiv.org/abs/2504.09609
We present an intuitive human-drone interaction system that utilizes a gesture-based motion controller to enhance the drone operation experience in real and simulated environments. The handheld motion controller enables natural control of the drone through the movements of the operator's hand, thumb, and index finger: the trigger press manages the throttle, the tilt of the hand adjusts pitch and roll, and the thumbstick controls yaw rotation. Communication with drones is facilitated via the ExpressLRS radio protocol, ensuring robust connectivity across various frequencies. The user evaluation of the flight experience with the designed drone controller using the UEQ-S survey showed high scores for both Pragmatic (mean=2.2, SD = 0.8) and Hedonic (mean=2.3, SD = 0.9) Qualities. This versatile control interface supports applications such as research, drone racing, and training programs in real and simulated environments, thereby contributing to advances in the field of human-drone interaction.
我们提出了一种直观的人机无人机交互系统,该系统利用基于手势的运动控制器来增强真实和模拟环境中的无人机操作体验。手持式运动控制器通过操作员的手部、拇指和食指的动作自然地控制无人机:扳机按键管理油门,手倾斜调整俯仰角和横滚角,而拇指杆则用于控制偏航旋转。通过ExpressLRS无线电协议与无人机进行通信,确保在各种频率下都有强大的连接性。 使用UEQ-S调查问卷对设计的无人机控制器进行了用户飞行体验评估,结果显示实用性和愉悦性得分都很高:实用性(均值=2.2,标准差=0.8)和享乐性(均值=2.3,标准差=0.9)。这种多功能控制界面支持包括研究、无人机竞速和训练项目在内的多种应用,在真实和模拟环境中都有所贡献,从而促进了人机交互领域的发展。
https://arxiv.org/abs/2504.09510
Typical drones with multi rotors are generally less maneuverable due to unidirectional thrust, which may be unfavorable to agile flight in very narrow and confined spaces. This paper suggests a new bio-inspired drone that is empowered with high maneuverability in a lightweight and easy-to-carry way. The proposed flying squirrel inspired drone has controllable foldable wings to cover a wider range of flight attitudes and provide more maneuverable flight capability with stable tracking performance. The wings of a drone are fabricated with silicone membranes and sophisticatedly controlled by reinforcement learning based on human-demonstrated data. Specially, such learning based wing control serves to capture even the complex aerodynamics that are often impossible to model mathematically. It is shown through experiment that the proposed flying squirrel drone intentionally induces aerodynamic drag and hence provides the desired additional repulsive force even under saturated mechanical thrust. This work is very meaningful in demonstrating the potential of biomimicry and machine learning for realizing an animal-like agile drone.
通常,多旋翼无人机由于单向推力的原因,在机动性方面较为不足,这可能不利于在非常狭窄和受限的空间内进行敏捷飞行。本文提出了一种受生物启发的新型无人机,该无人机通过轻量化设计并易于携带的方式,具备了高机动性的特点。所提出的飞鼠灵感无人机配备了可折叠控制翼,能够覆盖更广泛的飞行姿态,并提供更加灵活的飞行能力以及稳定的跟踪性能。 这种无人机的翅膀采用硅胶膜材料制成,并利用基于人类演示数据的强化学习技术进行复杂控制。特别地,基于学习的翼部控制方法可以捕捉到那些通常难以通过数学建模来描述的复杂空气动力学特性。实验表明,提出的飞鼠无人机能够故意产生气动阻力,在机械推力饱和的情况下仍能提供所需的额外反作用力。 这项工作非常有意义,它展示了仿生学和机器学习在实现类似动物敏捷性的无人机方面的巨大潜力。
https://arxiv.org/abs/2504.09478
Unmanned Aerial Vehicles (UAVs) are expected to transform logistics, reducing delivery time, costs, and emissions. This study addresses an on-demand delivery , in which fleets of UAVs are deployed to fulfil orders that arrive stochastically. Unlike previous work, it considers UAVs with heterogeneous, unknown energy storage capacities and assumes no knowledge of the energy consumption models. We propose a decentralised deployment strategy that combines auction-based task allocation with online learning. Each UAV independently decides whether to bid for orders based on its energy storage charge level, the parcel mass, and delivery distance. Over time, it refines its policy to bid only for orders within its capability. Simulations using realistic UAV energy models reveal that, counter-intuitively, assigning orders to the least confident bidders reduces delivery times and increases the number of successfully fulfilled orders. This strategy is shown to outperform threshold-based methods which require UAVs to exceed specific charge levels at deployment. We propose a variant of the strategy which uses learned policies for forecasting. This enables UAVs with insufficient charge levels to commit to fulfilling orders at specific future times, helping to prioritise early orders. Our work provides new insights into long-term deployment of UAV swarms, highlighting the advantages of decentralised energy-aware decision-making coupled with online learning in real-world dynamic environments.
无人飞行器(UAVs)有望通过减少交付时间、成本和排放来革新物流领域。本研究探讨了一种按需配送模式,在这种模式中,多架无人机被部署以应对随机到达的订单请求。与以往的研究不同,该研究考虑了具有异构且未知能量存储容量的无人机,并假设没有任何关于能耗模型的知识。我们提出了一种分散式部署策略,结合拍卖任务分配和在线学习方法。每架无人机根据其当前的能量储存水平、包裹质量及交付距离独立决定是否竞标订单。随着时间推移,它们会优化自己的策略,仅对在其能力范围内的订单进行投标。 使用现实中的UAV能耗模型的模拟表明,反直觉地将任务分配给最不自信的投标人实际上可以缩短交付时间并增加成功完成订单的数量。这种策略被证明比需要无人机在部署时达到特定电量阈值的方法更优。我们还提出了一种策略变体,该变体使用基于学习的政策进行预测。这使能量水平不足的UAV能够承诺在未来某个具体的时间点完成订单交付,有助于优先处理早期到达的订单。 我们的工作提供了关于长时间内无人机群组部署的新见解,并强调了在实际动态环境中结合分散式能源感知决策和在线学习方法的优势。
https://arxiv.org/abs/2504.08585
Detection Transformer-based methods have achieved significant advancements in general object detection. However, challenges remain in effectively detecting small objects. One key difficulty is that existing encoders struggle to efficiently fuse low-level features. Additionally, the query selection strategies are not effectively tailored for small objects. To address these challenges, this paper proposes an efficient model, Small Object Detection Transformer (SO-DETR). The model comprises three key components: a dual-domain hybrid encoder, an enhanced query selection mechanism, and a knowledge distillation strategy. The dual-domain hybrid encoder integrates spatial and frequency domains to fuse multi-scale features effectively. This approach enhances the representation of high-resolution features while maintaining relatively low computational overhead. The enhanced query selection mechanism optimizes query initialization by dynamically selecting high-scoring anchor boxes using expanded IoU, thereby improving the allocation of query resources. Furthermore, by incorporating a lightweight backbone network and implementing a knowledge distillation strategy, we develop an efficient detector for small objects. Experimental results on the VisDrone-2019-DET and UAVVaste datasets demonstrate that SO-DETR outperforms existing methods with similar computational demands. The project page is available at this https URL.
基于检测的Transformer方法在通用对象检测方面取得了显著进展,然而,在有效检测小型物体方面仍存在挑战。其中一个主要困难在于现有的编码器难以高效地融合低级特征。此外,查询选择策略也不适合针对小物体进行优化。为了解决这些难题,本文提出了一种高效的模型——小型目标检测Transformer(SO-DETR)。该模型包含三个关键组件:双域混合编码器、增强的查询选择机制和知识蒸馏策略。 双域混合编码器结合了空间和频率领域,有效地融合了多尺度特征。这种方法增强了高分辨率特征的表现力,同时保持相对较低的计算开销。增强的查询选择机制通过动态选择扩展IoU(交并比)得分较高的锚框来优化查询初始化,从而改进了查询资源的分配。 此外,通过引入轻量级骨干网络,并结合知识蒸馏策略,我们开发了一种高效的针对小物体的检测器。在VisDrone-2019-DET和UAVVaste数据集上的实验结果表明,在类似计算需求的情况下,SO-DETR优于现有的方法。该项目页面可在以下网址访问:[此URL]。
https://arxiv.org/abs/2504.11470
Mission planning for a fleet of cooperative autonomous drones in applications that involve serving distributed target points, such as disaster response, environmental monitoring, and surveillance, is challenging, especially under partial observability, limited communication range, and uncertain environments. Traditional path-planning algorithms struggle in these scenarios, particularly when prior information is not available. To address these challenges, we propose a novel framework that integrates Graph Neural Networks (GNNs), Deep Reinforcement Learning (DRL), and transformer-based mechanisms for enhanced multi-agent coordination and collective task execution. Our approach leverages GNNs to model agent-agent and agent-goal interactions through adaptive graph construction, enabling efficient information aggregation and decision-making under constrained communication. A transformer-based message-passing mechanism, augmented with edge-feature-enhanced attention, captures complex interaction patterns, while a Double Deep Q-Network (Double DQN) with prioritized experience replay optimizes agent policies in partially observable environments. This integration is carefully designed to address specific requirements of multi-agent navigation, such as scalability, adaptability, and efficient task execution. Experimental results demonstrate superior performance, with 90% service provisioning and 100% grid coverage (node discovery), while reducing the average steps per episode to 200, compared to 600 for benchmark methods such as particle swarm optimization (PSO), greedy algorithms and DQN.
针对由服务于分布式目标点的应用场景(如灾害响应、环境监测和监视)中的自主无人机编队的任务规划,特别是在部分可观察性、有限通信范围以及不确定环境中进行任务规划颇具挑战。传统的路径规划算法难以在这种情况下有效运行,尤其是在没有先验信息的情况下更为困难。为了解决这些问题,我们提出了一种新的框架,该框架集成了图神经网络(GNNs)、深度强化学习(DRL)和基于变压器的机制来增强多智能体协调与集体任务执行。 我们的方法利用GNN建模智能体之间的交互以及智能体与目标点之间的关系,通过自适应图构建实现信息高效聚合和决策制定,在通信受限的情况下尤其有效。此外,我们引入了一个带有边缘特征强化注意机制的变压器基消息传递架构,捕捉复杂的相互作用模式。同时采用具有优先经验回放的双深度Q网络(Double DQN)来优化智能体在部分可观测环境中的策略。 这种集成设计专门针对多智能体导航的需求进行了仔细调整,如可扩展性、适应性和有效任务执行。实验结果表明,在提供服务和网格覆盖(节点发现)方面表现出了优越性能:90%的服务供应量和100%的网格覆盖率。与标准方法(粒子群优化(PSO)、贪婪算法以及DQN)相比,我们的方法将每个时期的平均步数从600减少到200。
https://arxiv.org/abs/2504.08195
In this work, we evaluate the use of aerial drone hover constraints in a multisensor fusion of ground robot and drone data to improve the localization performance of a drone. In particular, we build upon our prior work on cooperative localization between an aerial drone and ground robot that fuses data from LiDAR, inertial navigation, peer-to-peer ranging, altimeter, and stereo-vision and evaluate the incorporation knowledge from the autopilot regarding when the drone is hovering. This control command data is leveraged to add constraints on the velocity state. Hover constraints can be considered important dynamic model information, such as the exploitation of zero-velocity updates in pedestrian navigation. We analyze the benefits of these constraints using an incremental factor graph optimization. Experimental data collected in a motion capture faculty is used to provide performance insights and assess the benefits of hover constraints.
在这项工作中,我们评估了在多传感器融合中使用空中无人机悬停限制来提升无人机定位性能的效果。具体而言,我们在先前关于空中无人机与地面机器人协同定位的工作基础上进行研究,该工作融合了激光雷达(LiDAR)、惯性导航、对等测距、气压高度计和立体视觉的数据,并评估将来自自动驾驶仪的悬停信息纳入其中的好处。我们利用这些控制指令数据来对速度状态施加限制条件。悬停限制可以被视为重要的动态模型信息,例如在行人导航中利用零速更新。我们通过增量因子图优化分析了这些约束带来的好处。实验数据是在运动捕捉实验室收集的,用于提供性能洞察并评估悬停限制的好处。
https://arxiv.org/abs/2504.07843
The flight time of multirotor unmanned aerial vehicles (UAVs) is typically constrained by their high power consumption. Tethered power systems present a viable solution to extend flight times while maintaining the advantages of multirotor UAVs, such as hover capability and agility. This paper addresses the critical aspect of cable selection for tether-powered multirotor UAVs, considering both hover and forward flight. Existing research often overlooks the trade-offs between cable mass, power losses, and system constraints. We propose a novel methodology to optimize cable selection, accounting for thrust requirements and power efficiency across various flight conditions. The approach combines physics-informed modeling with system identification to combine hover and forward flight dynamics, incorporating factors such as motor efficiency, tether resistance, and aerodynamic drag. This work provides an intuitive and practical framework for optimizing tethered UAV designs, ensuring efficient power transmission and flight performance. Thus allowing for better, safer, and more efficient tethered drones.
多旋翼无人驾驶航空器(UAV)的飞行时间通常受到其高能耗的限制。通过使用有缆供电系统,可以在保持多旋翼无人机的优势特性如悬停能力和灵活性的同时延长其飞行时间。本文探讨了为系留供电多旋翼无人机选择电缆的关键方面,同时考虑悬停和前向飞行状态。现有的研究往往忽视了电缆质量和功率损耗之间的权衡以及系统的限制条件。 我们提出了一种新的方法来优化电缆的选择,这种方法不仅考虑了推力需求,还涵盖了各种飞行条件下动力效率的综合考量。该方法结合了基于物理原理建模与系统识别技术,以联合分析悬停和前向飞行的动力学,并纳入电机效率、系绳电阻及气动阻力等因素。 此项工作提供了一种直观且实用的框架来优化有缆无人机的设计,确保高效的电力传输和飞行性能,从而实现更高效、安全且性能更好的有缆无人机。
https://arxiv.org/abs/2504.07802
Real-time wildlife detection in drone imagery is critical for numerous applications, including animal ecology, conservation, and biodiversity monitoring. Low-altitude drone missions are effective for collecting fine-grained animal movement and behavior data, particularly if missions are automated for increased speed and consistency. However, little work exists on evaluating computer vision models on low-altitude aerial imagery and generalizability across different species and settings. To fill this gap, we present a novel multi-environment, multi-species, low-altitude aerial footage (MMLA) dataset. MMLA consists of drone footage collected across three diverse environments: Ol Pejeta Conservancy and Mpala Research Centre in Kenya, and The Wilds Conservation Center in Ohio, which includes five species: Plains zebras, Grevy's zebras, giraffes, onagers, and African Painted Dogs. We comprehensively evaluate three YOLO models (YOLOv5m, YOLOv8m, and YOLOv11m) for detecting animals. Results demonstrate significant performance disparities across locations and species-specific detection variations. Our work highlights the importance of evaluating detection algorithms across different environments for robust wildlife monitoring applications using drones.
在无人机影像中进行实时野生动物检测对于动物生态学、保护和生物多样性监测等众多应用至关重要。低空无人机任务能够有效收集精细的动物移动和行为数据,特别是如果这些任务被自动化,则可以提高速度和一致性。然而,针对低空航拍图像评估计算机视觉模型并确保其在不同物种和环境中的一般适用性方面的工作还很少见。为了填补这一空白,我们提出了一种新颖的多环境、多物种、低空航拍视频(MMLA)数据集。该数据集包含在美国俄亥俄州野生动物保护区、肯尼亚奥尔佩贾塔保护中心及马帕拉研究中心三个不同环境中收集到的无人机影像资料,涵盖了五种不同的动物:平原斑马、格雷氏斑马、长颈鹿、瞪羚和非洲绘犬。 我们全面评估了三种YOLO模型(包括YOLOv5m、YOLOv8m和YOLOv11m)在检测这些动物时的表现。研究结果显示,不同地点及物种间的检测性能存在显著差异。我们的工作强调了为了实现使用无人机进行稳健的野生动物监测应用,在不同的环境中评估检测算法的重要性。 这项工作的贡献在于提供了用于评估低空航拍图像中动物检测算法性能的数据集,并强调了跨环境和物种的一般化能力对于开发实用且可靠的野生动物监控解决方案的重要性。
https://arxiv.org/abs/2504.07744
We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on this https URL
我们介绍了ThermoStereoRT,这是一种实时热立体匹配方法,旨在适用于所有天气条件,并从两个校准的热立体图像中恢复出视差图。这种技术可以应用于夜间无人机监控或床下清洁机器人等场景。通过采用轻量级但强大的骨干网络,ThermoStereoRT能够从热图像构建3D代价体并利用多尺度注意力机制生成初步视差图。为了进一步精炼该视差图,我们设计了一种新颖的通道和空间注意模块。针对热成像中稀疏的真实数据这一挑战,我们使用知识蒸馏来提升性能而不增加计算需求。在多个数据集上的全面评估表明,ThermoStereoRT不仅具备实时处理能力,还具有稳健的准确性,这使其成为各种复杂环境中实际部署的有前景解决方案。我们的代码将在以下链接发布:[此URL]
https://arxiv.org/abs/2504.07418
The Autonomy of Unmanned Aerial Vehicles (UAVs) in indoor environments poses significant challenges due to the lack of reliable GPS signals in enclosed spaces such as warehouses, factories, and indoor facilities. Micro Aerial Vehicles (MAVs) are preferred for navigating in these complex, GPS-denied scenarios because of their agility, low power consumption, and limited computational capabilities. In this paper, we propose a Reinforcement Learning based Deep-Proximal Policy Optimization (D-PPO) algorithm to enhance realtime navigation through improving the computation efficiency. The end-to-end network is trained in 3D realistic meta-environments created using the Unreal Engine. With these trained meta-weights, the MAV system underwent extensive experimental trials in real-world indoor environments. The results indicate that the proposed method reduces computational latency by 91\% during training period without significant degradation in performance. The algorithm was tested on a DJI Tello drone, yielding similar results.
在室内环境中,无人驾驶飞行器(UAVs)的自主性面临重大挑战,因为封闭空间如仓库、工厂和室内设施中缺乏可靠的GPS信号。微型空中车辆(MAVs)因其灵活性、低能耗和有限的计算能力,在这些复杂的无GPS场景下导航时更受欢迎。本文提出了一种基于强化学习的深层近端策略优化算法(D-PPO),旨在通过提高计算效率来增强实时导航性能。整个网络是在使用Unreal Engine创建的真实3D元环境中进行训练的。在获得这些经过训练的元权重后,MAV系统在现实世界的室内环境下进行了广泛的试验。结果显示,在训练期间,所提出的方法将计算延迟减少了91%,同时没有显著影响性能。该算法还通过DJI Tello无人机测试,并得到了类似的结果。
https://arxiv.org/abs/2504.05918
Object detection in Unmanned Aerial Vehicle (UAV) images poses significant challenges due to complex scale variations and class imbalance among objects. Existing methods often address these challenges separately, overlooking the intricate nature of UAV images and the potential synergy between them. In response, this paper proposes AD-Det, a novel framework employing a coherent coarse-to-fine strategy that seamlessly integrates two pivotal components: Adaptive Small Object Enhancement (ASOE) and Dynamic Class-balanced Copy-paste (DCC). ASOE utilizes a high-resolution feature map to identify and cluster regions containing small objects. These regions are subsequently enlarged and processed by a fine-grained detector. On the other hand, DCC conducts object-level resampling by dynamically pasting tail classes around the cluster centers obtained by ASOE, main-taining a dynamic memory bank for each tail class. This approach enables AD-Det to not only extract regions with small objects for precise detection but also dynamically perform reasonable resampling for tail-class objects. Consequently, AD-Det enhances the overall detection performance by addressing the challenges of scale variations and class imbalance in UAV images through a synergistic and adaptive framework. We extensively evaluate our approach on two public datasets, i.e., VisDrone and UAVDT, and demonstrate that AD-Det significantly outperforms existing competitive alternatives. Notably, AD-Det achieves a 37.5% Average Precision (AP) on the VisDrone dataset, surpassing its counterparts by at least 3.1%.
在无人驾驶航空器(UAV)图像中进行目标检测面临复杂的尺度变化和类别不平衡等重大挑战。现有的方法通常分别解决这些难题,而忽视了这些问题之间的内在联系以及它们之间可能存在的协同效应。为应对这一问题,本文提出了一种名为AD-Det的新框架,该框架采用一种连贯的粗到细策略,并集成了两个关键组件:自适应小目标增强(ASOE)和动态类别平衡粘贴(DCC)。ASOE利用高分辨率特征图来识别并聚类含有小目标的区域。随后,这些区域被放大并由一个精细的目标检测器处理。另一方面,DCC通过在ASOE获得的簇中心周围动态地放置长尾类别的对象进行对象级别的重采样,并为每个长尾类别维护一个动态记忆库。这一方法使得AD-Det不仅能够提取包含小目标的区域以实现精确检测,还能够在动态中合理地执行长尾类别对象的重采样。因此,通过这种协同和自适应框架,AD-Det显著提高了UAV图像中小尺度变化和类别不平衡挑战的整体检测性能。 我们对两个公开数据集(VisDrone和UAVDT)进行了广泛的评估,并证明了AD-Det在目标检测方面明显优于现有竞争方法。值得注意的是,在VisDrone数据集中,AD-Det达到了37.5%的平均精度(AP),比其他模型至少高出3.1%。
https://arxiv.org/abs/2504.05601
As autonomous agents become more powerful and widely used, it is becoming increasingly important to ensure they behave safely and stay aligned with system goals, especially in multi-agent settings. Current systems often rely on agents self-monitoring or correcting issues after the fact, but they lack mechanisms for real-time oversight. This paper introduces the Enforcement Agent (EA) Framework, which embeds dedicated supervisory agents into the environment to monitor others, detect misbehavior, and intervene through real-time correction. We implement this framework in a custom drone simulation and evaluate it across 90 episodes using 0, 1, and 2 EA configurations. Results show that adding EAs significantly improves system safety: success rates rise from 0.0% with no EA to 7.4% with one EA and 26.7% with two EAs. The system also demonstrates increased operational longevity and higher rates of malicious drone reformation. These findings highlight the potential of lightweight, real-time supervision for enhancing alignment and resilience in multi-agent systems.
随着自主代理(agents)变得越来越强大和广泛应用,确保它们的安全行为并使其与系统目标保持一致,特别是在多代理环境中,变得愈发重要。当前的系统通常依赖于代理自我监控或在事后纠正问题,但缺乏实时监督机制。本文介绍了“执行代理(EA)框架”,该框架将专门负责监督、检测不当行为并通过实时修正进行干预的监督代理嵌入到环境之中。我们在一个定制的无人机模拟环境中实现了这一框架,并通过使用0个、1个和2个执行代理配置进行了为期90轮的评估。结果表明,增加执行代理显著提高了系统的安全性:在没有执行代理的情况下成功率为0.0%,而在有一个执行代理时升至7.4%,有两个执行代理则进一步提升到26.7%。系统还显示了更高的操作持久性和恶意无人机改过的更高频率。这些发现突显了轻量级实时监督在提高多代理系统的一致性和抗压能力方面的潜力。
https://arxiv.org/abs/2504.04070
Cyber-physical systems designed in simulators, often consisting of multiple interacting agents, behave differently in the real-world. We would like to verify these systems during runtime when they are deployed. Thus, we propose robust predictive runtime verification (RPRV) algorithms for: (1) general stochastic CPS under signal temporal logic (STL) tasks, and (2) stochastic multi-agent systems (MAS) under spatio-temporal logic tasks. The RPRV problem presents the following challenges: (1) there may not be sufficient data on the behavior of the deployed CPS, (2) predictive models based on design phase system trajectories may encounter distribution shift during real-world deployment, and (3) the algorithms need to scale to the complexity of MAS and be applicable to spatio-temporal logic tasks. To address these challenges, we assume knowledge of an upper bound on the statistical distance (in terms of an f-divergence) between the trajectory distributions of the system at deployment and design time. We are motivated by our prior work [1, 2] where we proposed an accurate and an interpretable RPRV algorithm for general CPS, which we here extend to the MAS setting and spatio-temporal logic tasks. Specifically, we use a learned predictive model to estimate the system behavior at runtime and robust conformal prediction to obtain probabilistic guarantees by accounting for distribution shifts. Building on [1], we perform robust conformal prediction over the robust semantics of spatio-temporal reach and escape logic (STREL) to obtain centralized RPRV algorithms for MAS. We empirically validate our results in a drone swarm simulator, where we show the scalability of our RPRV algorithms to MAS and analyze the impact of different trajectory predictors on the verification result. To the best of our knowledge, these are the first statistically valid algorithms for MAS under distribution shift.
设计于模拟器中的网络物理系统(Cyber-Physical Systems,CPS),通常包含多个交互的代理,在现实世界中表现得不同。我们希望在这些系统部署并运行时对其进行验证。因此,我们提出了鲁棒预测实时验证(RPRV)算法来解决这一问题:(1) 适用于一般随机CPS的任务下的信号时态逻辑(STL),以及 (2) 在时空逻辑任务下应用于随机多智能体系统的任务。 RPRV面临以下挑战: 1. 部署后的CPS的行为数据可能不够充分。 2. 基于设计阶段系统轨迹的预测模型在现实世界部署中可能会遇到分布偏移(distribution shift)问题。 3. 算法需要适应多智能体系统的复杂性,并适用于时空逻辑任务。 为解决这些挑战,我们假设了解决方案上限,在部署和设计时间的系统轨迹分布之间的统计距离(以f-散度的形式)是已知的。我们的动机来自于先前的工作[1, 2]中提出的准确且可解释的一般CPS RPRV算法,并在此基础上将其扩展到多智能体系统的环境和时空逻辑任务。具体来说,我们使用了一个学习得到的预测模型来估计系统在运行时的行为,并利用鲁棒共形预测(robust conformal prediction)技术获取概率保证,从而考虑到分布偏移的问题。 基于[1]的研究成果,我们在时空到达与逃离逻辑(STREL)的鲁棒语义上执行了鲁棒共形预测以获得适用于多智能体系统的集中式RPRV算法。我们通过无人机群模拟器进行实证验证,展示了我们的RPRV算法在处理MAS时的可扩展性,并分析了不同轨迹预测方法对验证结果的影响。据我们所知,这是首个针对分布偏移下的多智能体系统而设计的有效统计验证算法。
https://arxiv.org/abs/2504.02964