Accurate detection and resilience of object detectors in structural damage detection are important in ensuring the continuous use of civil infrastructure. However, achieving robustness in object detectors remains a persistent challenge, impacting their ability to generalize effectively. This study proposes DetectorX, a robust framework for structural damage detection coupled with a micro drone. DetectorX addresses the challenges of object detector robustness by incorporating two innovative modules: a stem block and a spiral pooling technique. The stem block introduces a dynamic visual modality by leveraging the outputs of two Deep Convolutional Neural Network (DCNN) models. The framework employs the proposed event-based reward reinforcement learning to constrain the actions of a parent and child DCNN model leading to a reward. This results in the induction of two dynamic visual modalities alongside the Red, Green, and Blue (RGB) data. This enhancement significantly augments DetectorX's perception and adaptability in diverse environmental situations. Further, a spiral pooling technique, an online image augmentation method, strengthens the framework by increasing feature representations by concatenating spiraled and average/max pooled features. In three extensive experiments: (1) comparative and (2) robustness, which use the Pacific Earthquake Engineering Research Hub ImageNet dataset, and (3) field-experiment, DetectorX performed satisfactorily across varying metrics, including precision (0.88), recall (0.84), average precision (0.91), mean average precision (0.76), and mean average recall (0.73), compared to the competing detectors including You Only Look Once X-medium (YOLOX-m) and others. The study's findings indicate that DetectorX can provide satisfactory results and demonstrate resilience in challenging environments.
准确检测和增强结构损伤检测中对象探测器的鲁棒性对于确保民用基础设施的持续使用至关重要。然而,实现对象探测器的稳健性能仍然是一个持久的挑战,影响了它们的有效泛化能力。本研究提出了一种名为DetectorX的框架,它结合了一个微型无人机,并用于结构损坏检测,具有较强的鲁棒性。DetectorX通过引入两个创新模块——茎干块和螺旋池化技术,解决了对象探测器稳健性的挑战。 干细胞块通过利用两个深度卷积神经网络(DCNN)模型的输出来引入动态视觉模式。框架采用了一种基于事件的奖励增强学习方法,约束父级与子级DCNN模型的操作以获得奖励。这导致了除传统的红绿蓝(RGB)数据外,还产生了两种新的动态视觉模式。这一改进显著增强了DetectorX在各种环境情况下的感知和适应能力。 此外,螺旋池化技术作为一种在线图像增强方法,通过连接螺旋式和平均/最大池化的特征来提高特征表示的强度,从而加强了框架的整体性能。 在三项广泛的实验中(1)比较实验、(2)鲁棒性实验以及(3)实地实验,DetectorX在各种度量标准下表现出色,包括精度(0.88)、召回率(0.84)、平均精度(0.91)、均值平均精度(0.76)和均值平均召回率(0.73),与YOLOX-medium和其他竞争探测器相比表现优异。研究结果表明,DetectorX可以在具有挑战性的环境中提供令人满意的性能,并展现出强大的适应性。
https://arxiv.org/abs/2501.08807
Flooding is a major natural hazard causing significant fatalities and economic losses annually, with increasing frequency due to climate change. Rapid and accurate flood detection and monitoring are crucial for mitigating these impacts. This study compares the performance of three deep learning models UNet, ResNet, and DeepLabv3 for pixelwise water segmentation to aid in flood detection, utilizing images from drones, in field observations, and social media. This study involves creating a new dataset that augments wellknown benchmark datasets with flood-specific images, enhancing the robustness of the models. The UNet, ResNet, and DeepLab v3 architectures are tested to determine their effectiveness in various environmental conditions and geographical locations, and the strengths and limitations of each model are also discussed here, providing insights into their applicability in different scenarios by predicting image segmentation masks. This fully automated approach allows these models to isolate flooded areas in images, significantly reducing processing time compared to traditional semi-automated methods. The outcome of this study is to predict segmented masks for each image effected by a flood disaster and the validation accuracy of these models. This methodology facilitates timely and continuous flood monitoring, providing vital data for emergency response teams to reduce loss of life and economic damages. It offers a significant reduction in the time required to generate flood maps, cutting down the manual processing time. Additionally, we present avenues for future research, including the integration of multimodal data sources and the development of robust deep learning architectures tailored specifically for flood detection tasks. Overall, our work contributes to the advancement of flood management strategies through innovative use of deep learning technologies.
洪水是一种主要的自然灾害,每年造成大量人员伤亡和经济损失,并且由于气候变化而频发。快速准确地检测和监测洪水对于减轻这些影响至关重要。本研究比较了三种深度学习模型(UNet、ResNet 和 DeepLabv3)在像素级水体分割中的性能,以辅助洪水检测,使用来自无人机、现场观察和社会媒体的图像数据。这项研究包括创建一个新的数据集,该数据集通过添加特定于洪水的图像来增强已知基准数据集,从而提高模型的鲁棒性。 UNet、ResNet 和 DeepLab v3 架构在各种环境条件和地理位置中的有效性进行了测试,并讨论了每种模型的优点和局限性。这种方法能够预测每个受洪灾影响的图像的分割掩模,并验证这些模型的准确性。该方法促进了及时且持续的洪水监测,为应急响应团队提供了减少人员伤亡和经济损失的重要数据。它显著减少了生成洪水地图所需的时间,大大缩短了手动处理时间。 此外,我们还提出了未来研究的方向,包括多模式数据源的整合以及专用于洪水检测任务的鲁棒深度学习架构的发展。总体而言,我们的工作通过创新性地使用深度学习技术推进了洪灾管理策略的进步。
https://arxiv.org/abs/2501.08266
We consider the spatial classification problem for monitoring using data collected by a coordinated team of mobile robots. Such classification problems arise in several applications including search-and-rescue and precision agriculture. Specifically, we want to classify the regions of a search environment into interesting and uninteresting as quickly as possible using a team of mobile sensors and mobile charging stations. We develop a data-driven strategy that accommodates the noise in sensed data and the limited energy capacity of the sensors, and generates collision-free motion plans for the team. We propose a bi-level approach, where a high-level planner leverages a multi-armed bandit framework to determine the potential regions of interest for the drones to visit next based on the data collected online. Then, a low-level path planner based on integer programming coordinates the paths for the team to visit the target regions subject to the physical constraints. We characterize several theoretical properties of the proposed approach, including anytime guarantees and task completion time. We show the efficacy of our approach in simulation, and further validate these observations in physical experiments using mobile robots.
我们考虑使用移动机器人团队收集的数据进行空间分类问题监测的问题。这种分类问题在搜索与救援和精准农业等应用中出现。具体来说,我们的目标是利用一组移动传感器及移动充电站将搜索环境中的区域尽可能快地划分为有趣和无趣的区域。 为了实现这一目标,我们开发了一种数据驱动策略,该策略能够处理感应数据中的噪声以及传感器有限的能量容量,并为团队生成无碰撞运动计划。我们提出了一种双层方法:高层规划器利用多臂赌博机框架根据在线收集的数据确定无人机下一步要访问的潜在兴趣区域;然后,基于整数编程的低层路径规划器协调团队前往目标区域的路线,同时考虑物理约束条件。 我们还对所提出的这种方法的几个理论特性进行了描述,包括任何时间保证和任务完成时间。我们在模拟中展示了该方法的有效性,并通过使用移动机器人的实际实验进一步验证了这些观察结果。
https://arxiv.org/abs/2501.08222
Modern machine learning techniques have shown tremendous potential, especially for object detection on camera images. For this reason, they are also used to enable safety-critical automated processes such as autonomous drone flights. We present a study on object detection for Detect and Avoid, a safety critical function for drones that detects air traffic during automated flights for safety reasons. An ill-posed problem is the generation of good and especially large data sets, since detection itself is the corner case. Most models suffer from limited ground truth in raw data, \eg recorded air traffic or frontal flight with a small aircraft. It often leads to poor and critical detection rates. We overcome this problem by using inpainting methods to bootstrap the dataset such that it explicitly contains the corner cases of the raw data. We provide an overview of inpainting methods and generative models and present an example pipeline given a small annotated dataset. We validate our method by generating a high-resolution dataset, which we make publicly available and present it to an independent object detector that was fully trained on real data.
现代机器学习技术在对象检测方面展现出了巨大的潜力,特别是在处理摄像头图像时。因此,这些技术也被用于启用像自主无人机飞行这类安全关键的自动化流程。我们进行了一项研究,探讨了针对“Detect and Avoid”(识别与避让)功能的对象检测问题,“Detect and Avoid”是为无人机在自动飞行过程中因安全原因而需要检测空中交通的安全关键性任务。 一个棘手的问题是如何生成优质且大规模的数据集,因为对象检测本身便是稀有情况。大多数模型都会受限于原始数据中的有限真实标签,例如记录的空中交通或与小型飞机正面飞行的情况。这通常会导致较差和至关重要的检测率。我们通过使用“inpainting”(图像修复)方法来解决这个问题,这种方法可以扩充数据集,并确保其包含原始数据中的边缘情况。 我们将概述几种图像修复技术和生成模型,并展示如何利用一个较小的标注数据集构建一条样本工作流程。为了验证我们的方法的有效性,我们生成了一个高分辨率的数据集,并将其公开发布以供其他研究者使用。此外,我们将该数据集提供给一个完全基于真实数据训练的对象检测器进行测试。
https://arxiv.org/abs/2501.08142
This paper introduces a safe swarm of drones capable of performing landings in crowded environments robustly by relying on Reinforcement Learning techniques combined with Safe Learning. The developed system allows us to teach the swarm of drones with different dynamics to land on moving landing pads in an environment while avoiding collisions with obstacles and between agents. The safe barrier net algorithm was developed and evaluated using a swarm of Crazyflie 2.1 micro quadrotors, which were tested indoors with the Vicon motion capture system to ensure precise localization and control. Experimental results show that our system achieves landing accuracy of 2.25 cm with a mean time of 17 s and collision-free landings, underscoring its effectiveness and robustness in real-world scenarios. This work offers a promising foundation for applications in environments where safety and precision are paramount.
本文介绍了一种安全的无人机群,能够在拥挤环境中依靠强化学习技术和安全学习相结合的方法稳健地完成着陆任务。开发出的系统使我们能够教导具有不同动态特性的无人机群在移动起降点上着陆,并且在避免与障碍物和其他飞行器发生碰撞的情况下进行操作。通过使用Crazyflie 2.1 微型四旋翼机的无人机群,以及室内Vicon运动捕捉系统的精确定位和控制测试,开发并评估了安全屏障网络算法。实验结果表明,该系统能够以平均时间为17秒、着陆精度为2.25厘米且无碰撞的情况下完成着陆任务,这充分证明了其在现实世界场景中的有效性和稳健性。这项工作为需要确保安全和精确度的环境中提供了有前景的基础应用。
https://arxiv.org/abs/2501.07566
Improving robustness to uncertainty and rejection of external disturbances represents a significant challenge in aerial robotics. Nonlinear controllers based on Incremental Nonlinear Dynamic Inversion (INDI), known for their ability in estimating disturbances through measured-filtered data, have been notably used in such applications. Typically, these controllers comprise two cascaded loops: an inner loop employing nonlinear dynamic inversion and an outer loop generating the virtual control inputs via linear controllers. In this paper, a novel methodology is introduced, that combines the advantages of INDI with the robustness of linear structured $\mathcal{H}_\infty$ controllers. A full cascaded architecture is proposed to control the dynamics of a multirotor drone, covering both stabilization and guidance. In particular, low-order $\mathcal{H}_\infty$ controllers are designed for the outer loop by properly structuring the problem and solving it through non-smooth optimization. A comparative analysis is conducted between an existing INDI/PD approach and the proposed INDI/$\mathcal{H}_\infty$ strategy, showing a notable enhancement in the rejection of external disturbances. It is carried out first using MATLAB simulations involving a nonlinear model of a Parrot Bebop quadcopter drone, and then experimentally using a customized quadcopter built by the ENAC team. The results show an improvement of more than 50\% in the rejection of disturbances such as gusts.
提高空中机器人在不确定性和外部干扰下的鲁棒性是一项重要挑战。基于增量非线性动态逆(INDI)的非线性控制器因其能够通过测量过滤数据估计扰动的能力,在此类应用中得到了显著使用。这些控制器通常包括两个级联回路:内环采用非线性动态反演,外环则通过线性控制器生成虚拟控制输入。 本文提出了一种结合了INDI的优势和线性结构$\mathcal{H}_\infty$控制器鲁棒性的新方法。该方案提出了一个完整的级联架构来控制多旋翼无人机的动力学特性,涵盖稳定性和引导功能。特别地,在外环中设计了低阶的$\mathcal{H}_\infty$控制器,并通过合理的问题结构化和非光滑优化解决策略来实现。 本文对现有的INDI/PD方法与提出的INDI/$\mathcal{H}_\infty$策略进行了比较分析,结果显示在外部干扰排斥方面有显著改进。首先使用MATLAB仿真测试了Parrot Bebop四旋翼无人机的非线性模型,并随后通过由ENAC团队构建的定制化四旋翼机进行实验验证。结果表明,在诸如气流之类的扰动拒绝上提高了超过50%的表现。 此研究展示了一种新的控制器设计方法,该方法结合了INDI和$\mathcal{H}_\infty$控制的优势,为解决空中机器人面对复杂环境时的干扰问题提供了一个有效的解决方案。
https://arxiv.org/abs/2501.07223
Cross-view object geo-localization (CVOGL) aims to locate an object of interest in a captured ground- or drone-view image within the satellite image. However, existing works treat ground-view and drone-view query images equivalently, overlooking their inherent viewpoint discrepancies and the spatial correlation between the query image and the satellite-view reference image. To this end, this paper proposes a novel View-specific Attention Geo-localization method (VAGeo) for accurate CVOGL. Specifically, VAGeo contains two key modules: view-specific positional encoding (VSPE) module and channel-spatial hybrid attention (CSHA) module. In object-level, according to the characteristics of different viewpoints of ground and drone query images, viewpoint-specific positional codings are designed to more accurately identify the click-point object of the query image in the VSPE module. In feature-level, a hybrid attention in the CSHA module is introduced by combining channel attention and spatial attention mechanisms simultaneously for learning discriminative features. Extensive experimental results demonstrate that the proposed VAGeo gains a significant performance improvement, i.e., improving acc@0.25/acc@0.5 on the CVOGL dataset from 45.43%/42.24% to 48.21%/45.22% for ground-view, and from 61.97%/57.66% to 66.19%/61.87% for drone-view.
跨视角物体地理定位(CVOGL)的目标是在卫星图像中确定地面或无人机拍摄的查询图像中的感兴趣对象的位置。然而,现有的研究方法将地面视角和无人机视角的查询图像同等对待,忽视了它们之间固有的视点差异以及查询图像与卫星视角参考图之间的空间关联性。为此,本文提出了一种新的特定视图注意力地理定位方法(VAGeo),以实现更精确的CVOGL。 具体而言,VAGeo包含两个关键模块:特定视图的位置编码(VSPE)模块和通道-空间混合注意(CSHA)模块。在对象级别上,根据地面和无人机查询图像的不同视角特性,在VSPE模块中设计了特定视角的位置编码,以更准确地识别查询图像中的点击点物体。在特征级别上,通过结合通道注意力与空间注意力机制,CSHA模块引入了一种混合注意方法,用于学习判别性更强的特征。 大量的实验结果表明,所提出的VAGeo取得了显著的性能提升,在CVOGL数据集上的acc@0.25/acc@0.5指标上,地面视角从45.43%/42.24%提高到48.21%/45.22%,无人机视角从61.97%/57.66%提升至66.19%/61.87%。
https://arxiv.org/abs/2501.07194
In this paper, we present a large-scale fine-grained dataset using high-resolution images captured from locations worldwide. Compared to existing datasets, our dataset offers a significantly larger size and includes a higher level of detail, making it uniquely suited for fine-grained 3D applications. Notably, our dataset is built using drone-captured aerial imagery, which provides a more accurate perspective for capturing real-world site layouts and architectural structures. By reconstructing environments with these detailed images, our dataset supports applications such as the COLMAP format for Gaussian Splatting and the Structure-from-Motion (SfM) method. It is compatible with widely-used techniques including SLAM, Multi-View Stereo, and Neural Radiance Fields (NeRF), enabling accurate 3D reconstructions and point clouds. This makes it a benchmark for reconstruction and segmentation tasks. The dataset enables seamless integration with multi-modal data, supporting a range of 3D applications, from architectural reconstruction to virtual tourism. Its flexibility promotes innovation, facilitating breakthroughs in 3D modeling and analysis.
在这篇论文中,我们介绍了一个大规模的精细粒度数据集,该数据集使用从全球各地采集的高分辨率图像构建而成。与现有的数据集相比,我们的数据集规模更大,并且包含更高的细节水平,使其特别适合用于细粒度3D应用。值得注意的是,我们的数据集是基于无人机拍摄的空中影像构建的,这提供了捕捉真实世界场地布局和建筑结构更准确视角的能力。 通过使用这些详细图像重构环境,我们的数据集支持诸如用于高斯点置的COLMAP格式以及从运动中恢复结构(SfM)方法的应用。该数据集与包括同时定位与地图构建(SLAM)、多视图立体匹配和神经辐射场(NeRF)在内的广泛使用的技术兼容,从而能够进行准确的3D重建和点云生成。这使其成为重构和分割任务的基准。 此外,该数据集支持无缝集成多种模式的数据,适用于从建筑重建到虚拟旅游等各种3D应用。它的灵活性促进了创新,在3D建模和分析方面推动了突破性进展。
https://arxiv.org/abs/2501.06927
Detecting small targets in drone imagery is challenging due to low resolution, complex backgrounds, and dynamic scenes. We propose EDNet, a novel edge-target detection framework built on an enhanced YOLOv10 architecture, optimized for real-time applications without post-processing. EDNet incorporates an XSmall detection head and a Cross Concat strategy to improve feature fusion and multi-scale context awareness for detecting tiny targets in diverse environments. Our unique C2f-FCA block employs Faster Context Attention to enhance feature extraction while reducing computational complexity. The WIoU loss function is employed for improved bounding box regression. With seven model sizes ranging from Tiny to XL, EDNet accommodates various deployment environments, enabling local real-time inference and ensuring data privacy. Notably, EDNet achieves up to a 5.6% gain in mAP@50 with significantly fewer parameters. On an iPhone 12, EDNet variants operate at speeds ranging from 16 to 55 FPS, providing a scalable and efficient solution for edge-based object detection in challenging drone imagery. The source code and pre-trained models are available at: this https URL.
在无人机影像中检测小型目标因低分辨率、复杂背景和动态场景而颇具挑战。我们提出了一种名为EDNet的新型边缘-目标检测框架,该框架基于增强版YOLOv10架构构建,并针对实时应用进行了优化,无需后处理步骤。EDNet采用了一个改进的小型(XSmall)检测头以及跨级串联策略来提升特征融合和多尺度上下文感知能力,以在各种环境中更有效地检测微小目标。 我们的独特C2f-FCA模块利用了更快的上下文注意力机制,在减少计算复杂度的同时增强了特征提取。此外,我们采用了WIoU损失函数来提高边界框回归精度。EDNet提供了从Tiny到XL七种不同规模的模型版本,可适应多种部署环境,支持本地实时推理,并确保数据隐私。 特别值得一提的是,与现有方法相比,EDNet在mAP@50指标上取得了高达5.6%的性能提升,同时参数数量显著减少。在iPhone 12设备上,EDNet的不同变体运行速度从每秒16帧到每秒55帧不等,为边缘计算中的物体检测提供了可扩展且高效的解决方案,在具有挑战性的无人机影像中表现出色。 源代码和预训练模型可在以下网址获取:[此链接](https://this-link.com)。
https://arxiv.org/abs/2501.05885
Designing efficient neural networks for embedded devices is a critical challenge, particularly in applications requiring real-time performance, such as aerial imaging with drones and UAVs for emergency responses. In this work, we introduce TakuNet, a novel light-weight architecture which employs techniques such as depth-wise convolutions and an early downsampling stem to reduce computational complexity while maintaining high accuracy. It leverages dense connections for fast convergence during training and uses 16-bit floating-point precision for optimization on embedded hardware accelerators. Experimental evaluation on two public datasets shows that TakuNet achieves near-state-of-the-art accuracy in classifying aerial images of emergency situations, despite its minimal parameter count. Real-world tests on embedded devices, namely Jetson Orin Nano and Raspberry Pi, confirm TakuNet's efficiency, achieving more than 650 fps on the 15W Jetson board, making it suitable for real-time AI processing on resource-constrained platforms and advancing the applicability of drones in emergency scenarios. The code and implementation details are publicly released.
为嵌入式设备设计高效的神经网络是一个关键挑战,尤其是在需要实时性能的应用中,例如使用无人机和无人飞行器(UAV)进行紧急响应的航空成像。在这项工作中,我们介绍了TakuNet,这是一种新颖的轻量级架构,采用了深度卷积、早期下采样茎等技术以减少计算复杂性的同时保持高精度。它利用密集连接加速训练过程中的收敛,并使用16位浮点精度进行嵌入式硬件加速器上的优化。在两个公开数据集上进行的实验评估表明,尽管参数数量较少,TakuNet仍能实现近乎当前最优水平的紧急情况航空图像分类准确率。在Jetson Orin Nano和Raspberry Pi等实际嵌入式设备上的测试证实了TakuNet的高效性,在15W Jetson板上实现了超过650 fps的速度,使其适合资源受限平台上的实时AI处理,并推动了无人机在紧急场景中的应用潜力。代码及实现细节已公开发布。
https://arxiv.org/abs/2501.05880
The paper investigates the problem of path planning techniques for multi-copter uncrewed aerial vehicles (UAV) cooperation in a formation shape to examine surrounding surfaces. We first describe the problem as a joint objective cost for planning a path of the formation centroid working in a complicated space. The path planning algorithm, named the generalized particle swarm optimization algorithm, is then presented to construct an optimal, flyable path while avoiding obstacles and ensuring the flying mission requirements. A path-development scheme is then incorporated to generate a relevant path for each drone to maintain its position in the formation configuration. Simulation, comparison, and experiments have been conducted to verify the proposed approach. Results show the feasibility of the proposed path-planning algorithm with GEPSO.
该论文研究了多旋翼无人驾驶航空器(UAV)在编队形状下协同工作的路径规划技术,以探测周围表面的问题。首先,我们将问题描述为在一个复杂空间中制定编队质心路径的联合目标成本。然后提出了一种名为广义粒子群优化算法的路径规划算法,用于构建一条既能避开障碍物又能满足飞行任务要求的最佳可飞路径。接着整合了一个路径开发方案,以生成每个无人机的相关路径,确保其在编队配置中的位置稳定。通过仿真、对比和实验验证了所提出的方法的有效性。结果表明,基于GEPSO的路径规划算法是可行的。
https://arxiv.org/abs/2501.05770
The recent widespread adoption of drones for studying marine animals provides opportunities for deriving biological information from aerial imagery. The large scale of imagery data acquired from drones is well suited for machine learning (ML) analysis. Development of ML models for analyzing marine animal aerial imagery has followed the classical paradigm of training, testing, and deploying a new model for each dataset, requiring significant time, human effort, and ML expertise. We introduce Frame Level ALIgment and tRacking (FLAIR), which leverages the video understanding of Segment Anything Model 2 (SAM2) and the vision-language capabilities of Contrastive Language-Image Pre-training (CLIP). FLAIR takes a drone video as input and outputs segmentation masks of the species of interest across the video. Notably, FLAIR leverages a zero-shot approach, eliminating the need for labeled data, training a new model, or fine-tuning an existing model to generalize to other species. With a dataset of 18,000 drone images of Pacific nurse sharks, we trained state-of-the-art object detection models to compare against FLAIR. We show that FLAIR massively outperforms these object detectors and performs competitively against two human-in-the-loop methods for prompting SAM2, achieving a Dice score of 0.81. FLAIR readily generalizes to other shark species without additional human effort and can be combined with novel heuristics to automatically extract relevant information including length and tailbeat frequency. FLAIR has significant potential to accelerate aerial imagery analysis workflows, requiring markedly less human effort and expertise than traditional machine learning workflows, while achieving superior accuracy. By reducing the effort required for aerial imagery analysis, FLAIR allows scientists to spend more time interpreting results and deriving insights about marine ecosystems.
最近,无人机被广泛用于研究海洋动物,并为从空中影像中提取生物信息提供了新的机会。通过无人机获取的大量图像数据非常适合进行机器学习(ML)分析。然而,传统的开发方法是针对每个数据集训练、测试并部署一个新的模型,这需要大量的时间、人力和机器学习专业知识。 我们引入了一种名为Frame Level ALIgment and tRacking (FLAIR)的方法,它结合了Segment Anything Model 2 (SAM2)的视频理解能力和Contrastive Language-Image Pre-training (CLIP)的视觉语言能力。FLAIR接受无人机拍摄的视频作为输入,并输出整个视频中目标物种的分割掩模。特别值得注意的是,FLAIR采用了一种零样本方法,这意味着它不需要标注数据、训练新模型或对现有模型进行微调就能推广到其他物种。 在一组18,000张太平洋护士鲨无人机图像的数据集上,我们训练了最先进的物体检测模型与FLAIR进行了比较。结果表明,FLAIR大幅优于这些物体检测器,并且在提示SAM2的两种人机互动方法中表现出色,达到了Dice分数为0.81的良好效果。 此外,FLAIR能够轻松地推广到其他鲨鱼物种而不需要额外的人力投入,还能与新的启发式算法结合使用以自动提取包括长度和尾拍频率在内的相关信息。因此,FLAIR具有显著的潜力来加速空中影像分析工作流程,并且相较于传统的机器学习工作流程,它需要更少的人力和技术专长同时能够达到更高的准确性。 通过减少对空域图像分析的努力需求,FLAIR使科学家们有更多时间去解释结果并从海洋生态系统中获得见解。
https://arxiv.org/abs/2501.05717
The convergence of drone delivery systems, virtual worlds, and blockchain has transformed logistics and supply chain management, providing a fast, and environmentally friendly alternative to traditional ground transportation methods;Provide users with a real-world experience, virtual service providers need to collect up-to-the-minute delivery information from edge devices. To address this challenge, 1) a reinforcement learning approach is introduced to enable drones with fast training capabilities and the ability to autonomously adapt to new virtual scenarios for effective resource allocation.2) A semantic communication framework for meta-universes is proposed, which utilizes the extraction of semantic information to reduce the communication cost and incentivize the transmission of information for meta-universe services.3) In order to ensure that user information security, a lightweight authentication and key agreement scheme is designed between the drone and the user by introducing blockchain technology. In our experiments, the drone adaptation performance is improved by about 35\%, and the local offloading rate can reach 90\% with the increase of the number of base stations. The semantic communication system proposed in this paper is compared with the Cross Entropy baseline model. Introducing blockchain technology the throughput of the transaction is maintained at a stable value with different number of drones.
无人机交付系统、虚拟世界和区块链技术的融合已经改变了物流和供应链管理,提供了一种快速且环保的传统地面运输方法替代方案;为了向用户提供真实的体验,虚拟服务提供商需要从边缘设备收集实时的递送信息。为了解决这一挑战,提出了以下几项措施: 1. 通过引入强化学习方法来使无人机具备快速训练能力和自主适应新虚拟场景的能力,从而有效分配资源。 2. 提出了一个用于元宇宙的语义通信框架,该框架利用语义信息提取减少通信成本,并激励为元宇宙服务的信息传输。 3. 为了确保用户信息安全,通过引入区块链技术设计了一种轻量级的身份验证和密钥协商方案,以实现无人机与用户之间的安全连接。 在我们的实验中,无人机的适应性能提高了约35%,随着基站数量的增加,本地卸载率可达到90%。本文提出的语义通信系统与交叉熵基线模型进行了比较,并且引入区块链技术后,在不同数量的无人机情况下交易吞吐量保持在一个稳定的数值水平。
https://arxiv.org/abs/2501.04480
Objective: This paper describes the development of hybrid artificial intelligence strategies for drone navigation. Methods: The navigation module combines a deep learning model with a rule-based engine depending on the agent state. The deep learning model has been trained using reinforcement learning. The rule-based engine uses expert knowledge to deal with specific situations. The navigation module incorporates several strategies to explain the drone decision based on its observation space, and different mechanisms for including human decisions in the navigation process. Finally, this paper proposes an evaluation methodology based on defining several scenarios and analyzing the performance of the different strategies according to metrics adapted to each scenario. Results: Two main navigation problems have been studied. For the first scenario (reaching known targets), it has been possible to obtain a 90% task completion rate, reducing significantly the number of collisions thanks to the rule-based engine. For the second scenario, it has been possible to reduce 20% of the time required to locate all the targets using the reinforcement learning model. Conclusions: Reinforcement learning is a very good strategy to learn policies for drone navigation, but in critical situations, it is necessary to complement it with a rule-based module to increase task success rate.
目标:本文描述了用于无人机导航的混合人工智能策略的发展。 方法:导航模块结合了一个深度学习模型和一个基于规则的引擎,具体依赖于代理的状态。深度学习模型通过强化学习进行训练。基于规则的引擎使用专家知识来处理特定情况。导航模块采用多种策略,根据其观察空间解释无人机决策,并且具有不同的机制将人类决策纳入导航过程。最后,本文提出了一种基于定义若干场景并根据每个场景适应性指标分析不同策略性能的方法来进行评估。 结果:研究了两个主要的导航问题。对于第一个场景(到达已知目标),任务完成率达到了90%,通过基于规则的引擎显著减少了碰撞次数。在第二个场景中,利用强化学习模型将找到所有目标所需的时间减少了20%。 结论:强化学习是无人机导航策略学习的一个非常好的方法,但在关键情况下需要结合使用基于规则的模块以提高任务成功率。
https://arxiv.org/abs/2501.04472
Games have been vital test beds for the rapid development of Agent-based research. Remarkable progress has been achieved in the past, but it is unclear if the findings equip for real-world problems. While pressure grows, some of the most critical ecological challenges can find mitigation and prevention solutions through technology and its applications. Most real-world domains include multi-agent scenarios and require machine-machine and human-machine collaboration. Open-source environments have not advanced and are often toy scenarios, too abstract or not suitable for multi-agent research. By mimicking real-world problems and increasing the complexity of environments, we hope to advance state-of-the-art multi-agent research and inspire researchers to work on immediate real-world problems. Here, we present HIVEX, an environment suite to benchmark multi-agent research focusing on ecological challenges. HIVEX includes the following environments: Wind Farm Control, Wildfire Resource Management, Drone-Based Reforestation, Ocean Plastic Collection, and Aerial Wildfire Suppression. We provide environments, training examples, and baselines for the main and sub-tasks. All trained models resulting from the experiments of this work are hosted on Hugging Face. We also provide a leaderboard on Hugging Face and encourage the community to submit models trained on our environment suite.
游戏一直是基于代理的研究快速发展的关键测试平台。在过去,已经取得了显著的进步,但目前尚不清楚这些发现是否足以解决现实世界的问题。随着压力的增大,一些最紧迫的生态挑战可以通过技术及其应用找到缓解和预防方案。大多数实际领域包括多代理场景,并且需要机器与机器以及人机协作。然而,开源环境的发展滞后,通常仅限于玩具级情景、过于抽象或不适合进行多代理研究。 通过模仿现实世界的问题并增加环境的复杂性,我们希望推动多代理研究的前沿进展,并激励研究人员致力于解决当前的实际问题。在这里,我们介绍了HIVEX,这是一个用于评估基于生态挑战的多代理研究基准测试套件。HIVEX包括以下环境:风力发电场控制、野火资源管理、无人机造林、海洋塑料收集以及空中野火抑制。 我们为所有主要任务和子任务提供了环境、训练示例及基线模型。这项工作中的所有实验得出的训练模型均托管在Hugging Face上。我们还在Hugging Face上提供了一个排行榜,并鼓励社区提交基于我们的环境套件训练出的模型。
https://arxiv.org/abs/2501.04180
Multirotor unmanned aerial vehicle is a prevailing type of aircraft with wide real-world applications. Energy efficiency is a critical aspect of its performance, determining the range and duration of the missions that can be performed. In this study, we show both analytically and numerically that the optimum of a key energy efficiency index in forward flight, namely energy per meter traveled per unit mass, is a constant under different vehicle mass (including payload). Note that this relationship is only true under the optimal forward velocity that minimizes the energy consumption (under different mass), but not under arbitrary velocity. The study is based on a previously developed model capturing the first-principle energy dynamics of the multirotor, and a key step is to prove that the pitch angle under optimal velocity is a constant. By employing both analytical derivation and validation studies, the research provides critical insights into the optimization of multirotor energy efficiency, and facilitate the development of flight control strategies to extend mission duration and range.
多旋翼无人飞行器是一种在现实世界中具有广泛应用的飞机类型。能效是其性能的关键方面,决定了可以执行的任务范围和持续时间。在这项研究中,我们通过理论分析和数值计算展示了,在不同车辆质量(包括载荷)条件下,前进飞行中的关键能量效率指标——每单位质量下所行进一米所需的能量消耗达到最优时是一个常数。需要注意的是,这种关系仅在能够最小化能耗的最优化速度(对于不同的质量而言)下成立,并不适用于任意速度。 该研究基于先前开发的一种多旋翼飞机第一性原理能效动态模型,并且关键步骤是证明在最优速度条件下俯仰角为一常量。通过理论推导和验证研究,这项研究提供了对多旋翼飞行器能量效率优化的重要见解,有助于开发能够延长任务持续时间和范围的飞行控制策略。
https://arxiv.org/abs/2501.03102
We present an AI pipeline that involves using smart drones equipped with computer vision to obtain a more accurate fruit count and yield estimation of the number of blueberries in a field. The core components are two object-detection models based on the YOLO deep learning architecture: a Bush Model that is able to detect blueberry bushes from images captured at low altitudes and at different angles, and a Berry Model that can detect individual berries that are visible on a bush. Together, both models allow for more accurate crop yield estimation by allowing intelligent control of the drone's position and camera to safely capture side-view images of bushes up close. In addition to providing experimental results for our models, which show good accuracy in terms of precision and recall when captured images are cropped around the foreground center bush, we also describe how to deploy our models to map out blueberry fields using different sampling strategies, and discuss the challenges of annotating very small objects (blueberries) and difficulties in evaluating the effectiveness of our models.
我们提出了一种人工智能管道,该管道使用配备计算机视觉的智能无人机来获取田地中蓝莓数量和产量估算的更精确数据。核心组件是两个基于YOLO深度学习架构的对象检测模型:一个是能够从低空不同角度拍摄的照片中识别蓝莓灌木的灌木模型(Bush Model),另一个是可以检测出单个可见于灌木上的浆果的浆果模型(Berry Model)。这两个模型结合使用,允许智能控制无人机的位置和相机,以安全地捕捉近距离下的侧视图照片,从而实现更准确的作物产量估算。除了提供我们模型在捕获的照片被裁剪到前景中心灌木时显示出良好精确度和召回率的实验结果之外,我们还描述了如何部署这些模型来使用不同的采样策略绘制蓝莓田地图,并讨论了对非常小的对象(如蓝莓)进行标注以及评估我们的模型效果所面临的挑战。
https://arxiv.org/abs/2501.02344
Unmanned aerial vehicle object detection (UAV-OD) has been widely used in various scenarios. However, most existing UAV-OD algorithms rely on manually designed components, which require extensive tuning. End-to-end models that do not depend on such manually designed components are mainly designed for natural images, which are less effective for UAV imagery. To address such challenges, this paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery, i.e., UAV-DETR. The framework includes a multi-scale feature fusion with frequency enhancement module, which captures both spatial and frequency information at different scales. In addition, a frequency-focused down-sampling module is presented to retain critical spatial details during down-sampling. A semantic alignment and calibration module is developed to align and fuse features from different fusion paths. Experimental results demonstrate the effectiveness and generalization of our approach across various UAV imagery datasets. On the VisDrone dataset, our method improves AP by 3.1\% and $\text{AP}_{50}$ by 4.2\% over the baseline. Similar enhancements are observed on the UAVVaste dataset. The project page: this https URL
无人飞行器目标检测(UAV-OD)技术在各种场景中得到了广泛应用。然而,大多数现有的UAV-OD算法依赖于手动设计的组件,这些组件需要大量的调优工作。而那些不依赖于手动设计组件的端到端模型主要是为自然图像设计的,在处理无人机图像时效果较差。为了应对上述挑战,本文提出了一种专用于无人机图像的有效检测变压器(DETR)框架,即UAV-DETR。该框架包括一个多尺度特征融合与频率增强模块,可以捕捉不同尺度下的空间和频率信息。此外,还提供了一个专注于频率的下采样模块,在进行下采样时保留关键的空间细节。开发了一种语义对齐校准模块来对不同路径提取的功能进行对齐和融合。实验结果表明了我们方法在各种无人机图像数据集上的有效性和泛化能力。在VisDrone数据集上,我们的方法将基础模型的平均精度(AP)提高了3.1%,$\text{AP}_{50}$提高了4.2%。在UAVVaste数据集上也观察到了类似的改进效果。 项目页面: [此链接](this https URL)
https://arxiv.org/abs/2501.01855
Autonomous vehicles (AVs) rely on accurate trajectory prediction of surrounding vehicles to ensure the safety of both passengers and other road users. Trajectory prediction spans both short-term and long-term horizons, each requiring distinct considerations: short-term predictions rely on accurately capturing the vehicle's dynamics, while long-term predictions rely on accurately modeling the interaction patterns within the environment. However current approaches, either physics-based or learning-based models, always ignore these distinct considerations, making them struggle to find the optimal prediction for both short-term and long-term horizon. In this paper, we introduce the Dynamics-Enhanced Learning MOdel (DEMO), a novel approach that combines a physics-based Vehicle Dynamics Model with advanced deep learning algorithms. DEMO employs a two-stage architecture, featuring a Dynamics Learning Stage and an Interaction Learning Stage, where the former stage focuses on capturing vehicle motion dynamics and the latter focuses on modeling interaction. By capitalizing on the respective strengths of both methods, DEMO facilitates multi-horizon predictions for future trajectories. Experimental results on the Next Generation Simulation (NGSIM), Macau Connected Autonomous Driving (MoCAD), Highway Drone (HighD), and nuScenes datasets demonstrate that DEMO outperforms state-of-the-art (SOTA) baselines in both short-term and long-term prediction horizons.
自主驾驶车辆(AVs)依赖于周围车辆的准确轨迹预测,以确保乘客和其他道路使用者的安全。轨迹预测涵盖了短期和长期的时间范围,每一部分都需要不同的考量:短期预测依赖于精确捕捉车辆的动力学特性,而长期预测则需要准确建模环境中的互动模式。然而,目前无论是基于物理的方法还是基于学习的方法,在处理这两种不同时间跨度的预测时往往忽视了这些特定的需求,导致它们在同时实现短期和长期预测的最佳效果方面遇到困难。 本文介绍了一种名为动态增强学习模型(Dynamics-Enhanced Learning MOdel,简称DEMO)的新方法。该方法结合了物理动力学模型与高级深度学习算法的优势。DEMO采用两阶段架构:首先是动力学学习阶段,专注于捕捉车辆的运动特性;其次是互动学习阶段,侧重于建模环境中的相互作用模式。通过利用这两种方法各自的优点,DEMO能够为未来的轨迹预测提供多时间跨度的解决方案。 在Next Generation Simulation(NGSIM)、Macau Connected Autonomous Driving(MoCAD)、Highway Drone(HighD)以及nuScenes数据集上的实验结果表明,与最先进的基准相比,DEMO在短期和长期预测时间范围内均表现出更优的性能。
https://arxiv.org/abs/2412.20784
Motion capture has become increasingly important, not only in computer animation but also in emerging fields like the virtual reality, bioinformatics, and humanoid training. Capturing outdoor environments offers extended horizon scenes but introduces challenges with occlusions and obstacles. Recent approaches using multi-drone systems to capture multiple actor scenes often fail to account for multi-view consistency and reasoning across cameras in cluttered environments. Coordinated motion Capture (CoCap), inspired by Conflict-Based Search (CBS), addresses this issue by coordinating view planning to ensure multi-view reasoning during conflicts. In scenarios with high occlusions and obstacles, where the likelihood of inter-robot collisions increases, CoCap demonstrates performance that approaches the ideal outcomes of unconstrained planning, outperforming existing sequential planning methods. Additionally, CoCap offers a single-robot view search approach for real-time applications in dense environments.
动作捕捉技术在计算机动画、虚拟现实、生物信息学和人形机器人训练等新兴领域中变得越来越重要。虽然室外环境的动作捕捉可以提供广阔的视野,但也带来了遮挡物和障碍物的挑战。最近采用多无人机系统来捕获多个演员场景的方法通常未能考虑到跨摄像机的一致性和推理问题,在拥挤环境中尤其如此。受到冲突基于搜索(CBS)启发的协调动作捕捉(CoCap)通过协调视图规划来解决这一问题,以确保在发生冲突时进行多视角推理。 在高度遮挡和障碍物存在的场景中,机器人之间发生碰撞的可能性增加,CoCap的表现接近于无约束规划的理想结果,并且优于现有的顺序规划方法。此外,CoCap还提供了一种单机器人视图搜索的方法,适用于密集环境中的实时应用。
https://arxiv.org/abs/2412.20695