This project aims to revolutionize drone flight control by implementing a nonlinear Deep Reinforcement Learning (DRL) agent as a replacement for traditional linear Proportional Integral Derivative (PID) controllers. The primary objective is to seamlessly transition drones between manual and autonomous modes, enhancing responsiveness and stability. We utilize the Proximal Policy Optimization (PPO) reinforcement learning strategy within the Gazebo simulator to train the DRL agent. Adding a $20,000 indoor Vicon tracking system offers <1mm positioning accuracy, which significantly improves autonomous flight precision. To navigate the drone in the shortest collision-free trajectory, we also build a 3 dimensional A* path planner and implement it into the real flight successfully.
本项目旨在通过实现一个非线性深度强化学习(DRL)代理来颠覆无人机飞行控制,用来说明传统的线性比例微分(PID)控制器。主要目标是使无人机无缝地在手动和自动驾驶模式之间转换,提高反应性和稳定性。我们在Gazebo仿真器中利用Proximal Policy Optimization(PPO)强化学习策略来训练DRL代理。增加一个20,000美元的室内Vicon跟踪系统提供了<1mm的定位精度,这显著提高了自主飞行的精确度。为了在最近的碰撞避免轨迹中引导无人机,我们还构建了一个3维A*路径规划器,并成功地将其融入到实际飞行中。
https://arxiv.org/abs/2404.00204
The goal of field boundary delineation is to predict the polygonal boundaries and interiors of individual crop fields in overhead remotely sensed images (e.g., from satellites or drones). Automatic delineation of field boundaries is a necessary task for many real-world use cases in agriculture, such as estimating cultivated area in a region or predicting end-of-season yield in a field. Field boundary delineation can be framed as an instance segmentation problem, but presents unique research challenges compared to traditional computer vision datasets used for instance segmentation. The practical applicability of previous work is also limited by the assumption that a sufficiently-large labeled dataset is available where field boundary delineation models will be applied, which is not the reality for most regions (especially under-resourced regions such as Sub-Saharan Africa). We present an approach for segmentation of crop field boundaries in satellite images in regions lacking labeled data that uses multi-region transfer learning to adapt model weights for the target region. We show that our approach outperforms existing methods and that multi-region transfer learning substantially boosts performance for multiple model architectures. Our implementation and datasets are publicly available to enable use of the approach by end-users and serve as a benchmark for future work.
野外边界勾勒的目标是预测覆盖遥感图像(如卫星或无人机)中单个农田的多边形边界和内部。自动划分田野边界对于许多农业现实应用场景(如估算地区耕地面积或预测田地收获量)是必要的。将野外边界勾勒视为实例分割问题,但与用于实例分割的传统计算机视觉数据集相比,它呈现出了独特的研究挑战。以前工作的实用性也受到假设足够大的有标签数据集存在的限制,该数据集将用于应用田野边界分割模型,这在大多数地区并不现实(尤其是在资源相对匮乏的地区,如撒哈拉以南非洲地区)。我们提出了一个在缺乏有标签数据集的地区分割卫星图像中作物农田边界的分割方法,利用多区域迁移学习来适应目标区域。我们证明了我们的方法超越了现有方法,多区域迁移学习在多个模型架构上显著提高了性能。我们的实现和数据集都是公开的,以便用户使用,并作为未来工作的基准。
https://arxiv.org/abs/2404.00179
Legal autonomy - the lawful activity of artificial intelligence agents - can be achieved in one of two ways. It can be achieved either by imposing constraints on AI actors such as developers, deployers and users, and on AI resources such as data, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment. The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices (e.g., encoding rules about limitations on zones of operations into the agent software of an autonomous drone device). This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable, and that would enable AI agents to reason about the law. In this paper, we sketch a proof of principle for such a method using large language models (LLMs), expert legal systems known as legal decision paths, and Bayesian networks. We then show how the proposed method could be applied to extant regulation in matters of autonomous cars, such as the California Vehicle Code.
法律自主权 - 人工智能代理的合法活动 - 可以通过两种方式实现。这可以通过对人工智能代理开发者、部署者和用户施加限制,以及对人工智能资源如数据施加限制来实现。后一种方法涉及将关于人工智能驱动设备的现有规则编码到控制这些设备的AI代理软件中(例如,将操作区域限制规则编码到自主无人机设备的代理软件中)。这种方法具有挑战性,因为实现这种方法需要一种提取、加载、转换和计算法律信息的可解释且具有法律可互操作性的方法,这将使AI代理能够推理法律。在本文中,我们用大规模语言模型(LLMs)、著名的法律决策路径(专家法律系统)和贝叶斯网络来阐述这种方法的一个原则。然后,我们展示了如何将该方法应用于自动驾驶汽车领域的现有法规,例如加州车辆代码。
https://arxiv.org/abs/2403.18537
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely on predicting future scenes (video frames), suffering from the neglect of explicitly modeling target's motion states, which is crucial for aerial video interpretation. To address this issue, we introduce a novel task called Target-Aware Aerial Video Prediction, aiming to simultaneously predict future scenes and motion states of the target. Further, we design a model specifically for this task, named TAFormer, which provides a unified modeling approach for both video and target motion states. Specifically, we introduce Spatiotemporal Attention (STA), which decouples the learning of video dynamics into spatial static attention and temporal dynamic attention, effectively modeling the scene appearance and motion. Additionally, we design an Information Sharing Mechanism (ISM), which elegantly unifies the modeling of video and target motion by facilitating information interaction through two sets of messenger tokens. Moreover, to alleviate the difficulty of distinguishing targets in blurry predictions, we introduce Target-Sensitive Gaussian Loss (TSGL), enhancing the model's sensitivity to both target's position and content. Extensive experiments on UAV123VP and VisDroneVP (derived from single-object tracking datasets) demonstrate the exceptional performance of TAFormer in target-aware video prediction, showcasing its adaptability to the additional requirements of aerial video interpretation for target awareness.
随着无人机技术的进步,使用无人机进行航空测量已成为现代低空遥感的优势趋势。高空视频数据的激增迫使对未来场景和感兴趣目标的动态状态进行准确预测,特别是在交通管理和灾害应对等领域。现有的视频预测方法仅关注预测未来场景(视频帧),忽略了明确建模目标运动状态,这是高空视频解释的关键。为解决这个问题,我们引入了一个名为 Target-Aware Aerial Video Prediction 的新任务,旨在同时预测未来场景和目标的动态状态。此外,我们为这个任务设计了一个名为 TAFormer 的模型,提供了一种统一建模视频和目标运动状态的方法。具体来说,我们引入了 Spatiotemporal Attention(STA),将视频动态学习的空间静态注意力和时间动态注意力解耦,有效建模场景外观和运动。此外,我们设计了一个信息共享机制(ISM),通过促进信息交互来统一建模视频和目标运动。为了减轻在模糊预测中区分目标的努力,我们引入了 Target-Sensitive Gaussian Loss(TSGL),提高了模型对目标位置和内容的敏感度。对于 UAV123VP 和 VisDroneVP(源于单对象跟踪数据集)的实验表明,TAFormer 在目标意识视频预测方面的表现异常出色,展示了它对空中视频解释额外需求的适应能力。
https://arxiv.org/abs/2403.18238
Automating the current bridge visual inspection practices using drones and image processing techniques is a prominent way to make these inspections more effective, robust, and less expensive. In this paper, we investigate the development of a novel deep-learning method for the detection of fatigue cracks in high-resolution images of steel bridges. First, we present a novel and challenging dataset comprising of images of cracks in steel bridges. Secondly, we integrate the ConvNext neural network with a previous state- of-the-art encoder-decoder network for crack segmentation. We study and report, the effects of the use of background patches on the network performance when applied to high-resolution images of cracks in steel bridges. Finally, we introduce a loss function that allows the use of more background patches for the training process, which yields a significant reduction in false positive rates.
使用无人机和图像处理技术自动化当前的桥梁视觉检查做法是一种有效、稳健且成本较低的方法。在本文中,我们研究了用于检测钢桥高分辨率图像中疲劳裂纹的新颖深度学习方法的发展。首先,我们提出了一个由钢桥裂纹图像组成的全新有挑战性的数据集。其次,我们将ConvNext神经网络与之前的最先进的编码器-解码器网络相结合进行裂纹分割。我们研究并报道了将背景补丁应用于高分辨率裂纹钢桥图像时对网络性能的影响。最后,我们引入了一种允许在训练过程中使用更多背景补丁的损失函数,从而显著降低 false positive 率。
https://arxiv.org/abs/2403.17725
Time-optimal quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a leading model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi-objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a constraint and terminal set. The safety set is designed as a spatial constraint which prevents gate collisions while allowing for time-optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state of the art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best state-of-the-art RL and outperforms the best time-optimal controller while satisfying constraints. In both simulation and real-world, our approach consistently prevents gate crashes with 100\% success rate, while pushing the quadrotor to its physical limit reaching speeds of more than 80km/h.
时间最优的 quadrotor 飞行是一个极其具有挑战性的问题,因为在手柄的极限处遇到的控制权限非常有限。为了实现时间优化的无人机竞赛,模型预测控制(MPC)作为一种基于模型的方法已经成为了领先的模式。然而,在无人机竞赛中使用的标准 MPCC 形式在成本函数中引入了门的概念,导致目标函数连续地平衡在最大化进步和精确跟踪路径之间。本文介绍了三个关键组件,增强了在无人机竞赛中使用 MPC 的方法。首先,我们提供了安全保证,以约束和终止集的形式提供保障。安全集被设计为空间约束,在防止门碰撞的同时,仅允许在成本函数中进行时间优化。其次,我们通过残差项增加了现有的第一性原理动力学,并捕捉了从现实世界数据中获得的复杂空气动力学效应和推力。第三,我们使用了一种最先进的全球贝叶斯优化算法——Trust Region Bayesian Optimization (TuRBO)来根据基于 lap 时间最小化的稀疏奖励来调整 MPC 控制器的超参数。所提出的方法在类似 lap 时间内取得了与最佳状态下的 RL 相同的速度,并且在满足约束的情况下超越了最佳时间最优控制器。在仿真和现实世界里,我们的方法始终能够以 100% 的成功率防止门碰撞,并将无人机推向其物理极限,达到超过 80km/h 的速度。
https://arxiv.org/abs/2403.17551
The formation trajectory planning using complete graphs to model collaborative constraints becomes computationally intractable as the number of drones increases due to the curse of dimensionality. To tackle this issue, this paper presents a sparse graph construction method for formation planning to realize better efficiency-performance trade-off. Firstly, a sparsification mechanism for complete graphs is designed to ensure the global rigidity of sparsified graphs, which is a necessary condition for uniquely corresponding to a geometric shape. Secondly, a good sparse graph is constructed to preserve the main structural feature of complete graphs sufficiently. Since the graph-based formation constraint is described by Laplacian matrix, the sparse graph construction problem is equivalent to submatrix selection, which has combinatorial time complexity and needs a scoring metric. Via comparative simulations, the Max-Trace matrix-revealing metric shows the promising performance. The sparse graph is integrated into the formation planning. Simulation results with 72 drones in complex environments demonstrate that when preserving 30\% connection edges, our method has comparative formation error and recovery performance w.r.t. complete graphs. Meanwhile, the planning efficiency is improved by approximate an order of magnitude. Benchmark comparisons and ablation studies are conducted to fully validate the merits of our method.
使用完整图模型建模协同约束的轨迹规划变得计算复杂,因为维度诅咒。为解决这一问题,本文提出了一种稀疏图构建方法,以实现更好的效率-性能权衡。首先,设计了一个稀疏化机制,以确保稀疏化图的全局刚度,这是稀疏化图形与几何形状唯一对应的一个必要条件。其次,为了保留完整图形的主要结构特征,构建了一个良好的稀疏图。由于基于图形的规划约束用拉普拉斯矩阵表示,稀疏图构建问题等价于子矩阵选择,具有组合时间复杂度和需要评分指标。通过比较仿真,Max-Trace矩阵揭示 metric显示出有前景的性能。稀疏图被整合到轨迹规划中。在复杂环境中,具有 72 个无人机的仿真结果表明,在保留 30% 的连接边时,我们的方法具有与完整图形相当的组建误差和恢复性能。同时,通过近似 order of magnitude 的规划效率得到了提高。通过基准比较和消融研究,全面验证了我们的方法的优点。
https://arxiv.org/abs/2403.17288
Nano-drones, distinguished by their agility, minimal weight, and cost-effectiveness, are particularly well-suited for exploration in confined, cluttered and narrow spaces. Recognizing transparent, highly reflective or absorbing materials, such as glass and metallic surfaces is challenging, as classical sensors, such as cameras or laser rangers, often do not detect them. Inspired by bats, which can fly at high speeds in complete darkness with the help of ultrasound, this paper introduces \textit{BatDeck}, a pioneering sensor-deck employing a lightweight and low-power ultrasonic sensor for nano-drone autonomous navigation. This paper first provides insights about sensor characteristics, highlighting the influence of motor noise on the ultrasound readings, then it introduces the results of extensive experimental tests for obstacle avoidance (OA) in a diverse environment. Results show that \textit{BatDeck} allows exploration for a flight time of 8 minutes while covering 136m on average before crash in a challenging environment with transparent and reflective obstacles, proving the effectiveness of ultrasonic sensors for OA on nano-drones.
纳米无人机以其敏捷性、轻便性和经济性脱颖而出,特别适合在狭小、杂乱和拥挤的空间中进行探索。意识到透明、高度反射或吸收材料的经典传感器(如相机或激光雷达)往往无法检测它们。受到蝙蝠启发,这些生物在完全黑暗中借助超声波可以高速飞行,本文引入了\textit{BatDeck},这是一款采用轻量化、低功耗超声波传感器为纳米无人机自主导航的开创性传感器阵列。本文首先提供了关于传感器特性的见解,强调了电机噪音对超声读数的影响,然后介绍了在各种环境中进行避开障碍(OA)测试的结果。结果表明,\textit{BatDeck}在具有透明和反射性障碍物的挑战环境中,可以让飞行时间延长至8分钟,证明超声传感器在纳米无人机OA方面的有效性。
https://arxiv.org/abs/2403.16696
This paper proposes an Emergency Battery Service (EBS) for drones in which an EBS drone flies to a drone in the field with a depleted battery and transfers a fresh battery to the exhausted drone. The authors present a unique battery transfer mechanism and drone localization that uses the Cross Marker Position (CMP) method. The main challenges include a stable and balanced transfer that precisely localizes the receiver drone. The proposed EBS drone mitigates the effects of downwash due to the vertical proximity between the drones by implementing diagonal alignment with the receiver, reducing the distance to 0.5 m between the two drones. CFD analysis shows that diagonal instead of perpendicular alignment minimizes turbulence, and the authors verify the actual system for change in output airflow and thrust measurements. The CMP marker-based localization method enables position lock for the EBS drone with up to 0.9 cm accuracy. The performance of the transfer mechanism is validated experimentally by successful mid-air transfer in 5 seconds, where the EBS drone is within 0.5 m vertical distance from the receiver drone, wherein 4m/s turbulence does not affect the transfer process.
本文提出了一种针对无人机的紧急电池服务(EBS)方案,其中EBS无人机会飞向场中的一架耗尽电池的无人机,并将其充满电的电池传输给耗尽电池的无人机。作者提出了一个独特的电池传输机制和无人机定位方法,利用交叉标记位置(CMP)方法。主要挑战包括稳定和平衡的传输,精确地将接收无人机的位置确定下来。所提出的EBS无人机通过将机身与接收机无人机对齐,减小垂直距离,从而减轻了俯冲效应。CFD分析表明,相对于垂直或平行对齐,横着对齐可以最小化湍流,并且作者验证了输出空气流和推力的变化。基于CMP标记的定位方法使得EBS无人机的定位精度可以达到0.9厘米。通过成功的中间空中转移5秒钟来验证传输机制的性能,其中EBS无人机距离接收机无人机约0.5米,而4米/秒的湍流并没有影响传输过程。
https://arxiv.org/abs/2403.16430
This paper addresses the problem of target search and tracking using a fleet of cooperating UAVs evolving in some unknown region of interest containing an a priori unknown number of moving ground targets. Each drone is equipped with an embedded Computer Vision System (CVS), providing an image with labeled pixels and a depth map of the observed part of its environment. Moreover, a box containing the corresponding pixels in the image frame is available when a UAV identifies a target. Hypotheses regarding information provided by the pixel classification, depth map construction, and target identification algorithms are proposed to allow its exploitation by set-membership approaches. A set-membership target location estimator is developed using the information provided by the CVS. Each UAV evaluates sets guaranteed to contain the location of the identified targets and a set possibly containing the locations of targets still to be identified. Then, each UAV uses these sets to search and track targets cooperatively.
本文解决了使用一组协同的无人机进行目标搜索和跟踪的问题,这些无人机在一个包含已知数量移动地面目标但具体情况不明的兴趣区域中进化。每个无人机都配备了一个嵌入的计算机视觉系统(CVS),提供了带有标签的像素图像和观察其环境部分深度的图。此外,当无人机识别到一个目标时,可以在图像帧中提供相应像素的盒子。关于像素分类、深度图构建和目标识别算法提供的信息的假设提出了,以便其被集合成员制方法所利用。使用CVS提供的信息,开发了一个集合成员制目标位置估计器。每个无人机评估包含已识别目标的位置和可能包含尚未识别目标位置的集合。然后,每个无人机使用这些集合来与其他无人机协同搜索和跟踪目标。
https://arxiv.org/abs/2403.15113
In this paper, we explore the application of Unmanned Aerial Vehicles (UAVs) in maritime search and rescue (mSAR) missions, focusing on medium-sized fixed-wing drones and quadcopters. We address the challenges and limitations inherent in operating some of the different classes of UAVs, particularly in search operations. Our research includes the development of a comprehensive software framework designed to enhance the efficiency and efficacy of SAR operations. This framework combines preliminary detection onboard UAVs with advanced object detection at ground stations, aiming to reduce visual strain and improve decision-making for operators. It will be made publicly available upon publication. We conduct experiments to evaluate various Region of Interest (RoI) proposal methods, especially by imposing simulated limited bandwidth on them, an important consideration when flying remote or offshore operations. This forces the algorithm to prioritize some predictions over others.
在本文中,我们探讨了在海上搜索和救助(mSAR)任务中应用无人机(UAVs)的应用,重点关注中大型固定翼无人机和四旋翼。我们解决了操作不同种类UAV固有的挑战和局限性,特别是搜索操作。我们的研究包括开发一个全面软件框架,旨在提高SAR操作的效率和效果。该框架将无人机上初步的检测与地面站的高级物体检测相结合,旨在减少视觉疲劳并提高操作者的决策能力。在发表后,该框架将公开可用。我们进行实验来评估各种兴趣区域(RoI)提案方法,尤其是通过模拟有限的带宽来对待它们,这是在飞行远程或海上操作时需要考虑的重要问题。这迫使算法优先考虑某些预测而不是其他预测。
https://arxiv.org/abs/2403.14281
Detecting marine objects inshore presents challenges owing to algorithmic intricacies and complexities in system deployment. We propose a difficulty-aware edge-cloud collaborative sensing system that splits the task into object localization and fine-grained classification. Objects are classified either at the edge or within the cloud, based on their estimated difficulty. The framework comprises a low-power device-tailored front-end model for object localization, classification, and difficulty estimation, along with a transformer-graph convolutional network-based back-end model for fine-grained classification. Our system demonstrates superior performance (mAP@0.5 +4.3%}) on widely used marine object detection datasets, significantly reducing both data transmission volume (by 95.43%) and energy consumption (by 72.7%}) at the system level. We validate the proposed system across various embedded system platforms and in real-world scenarios involving drone deployment.
由于算法复杂性和系统部署中的技术挑战,检测沿海水域中的海洋对象存在挑战。我们提出了一个困难感知边缘云协同感知系统,将任务分为物体定位和细粒度分类。根据物体的估计难度,物体可以分类在边缘或在云中。该框架包括一个针对低功耗设备的定位、分类和难度估计的前端模型,以及一个基于Transformer-graph卷积网络的后端模型进行细粒度分类。我们的系统在广泛使用的海洋对象检测数据集上表现出卓越的性能(mAP@0.5 + 4.3%),在系统级别显著减少了数据传输量(95.43%)和能源消耗(72.7%)。我们在各种嵌入系统平台上验证了所提出的系统,并在无人机部署的现实场景中进行了验证。
https://arxiv.org/abs/2403.14027
In Federated Learning (FL), multiple clients collaboratively train a global model without sharing private data. In semantic segmentation, the Federated source Free Domain Adaptation (FFreeDA) setting is of particular interest, where clients undergo unsupervised training after supervised pretraining at the server side. While few recent works address FL for autonomous vehicles, intrinsic real-world challenges such as the presence of adverse weather conditions and the existence of different autonomous agents are still unexplored. To bridge this gap, we address both problems and introduce a new federated semantic segmentation setting where both car and drone clients co-exist and collaborate. Specifically, we propose a novel approach for this setting which exploits a batch-norm weather-aware strategy to dynamically adapt the model to the different weather conditions, while hyperbolic space prototypes are used to align the heterogeneous client representations. Finally, we introduce FLYAWARE, the first semantic segmentation dataset with adverse weather data for aerial vehicles.
在联邦学习(FL)中,多个客户端协同训练一个全局模型,而无需共享私有数据。在语义分割中,联邦源自由领域适应(FFreeDA)设置特别有趣,其中客户端在服务器端经过监督预训练后进行无监督训练。虽然为自动驾驶车辆的研究相对较少,但仍然未探索实世界挑战,如存在不利天气条件和存在不同自主代理的情况。为了弥合这一空白,我们解决了这两个问题,并引入了一个新的联邦语义分割设置,其中车和无人机客户端共存和协作。具体来说,我们提出了一个利用批量归一化策略动态适应不同天气条件的模型的新方法,同时使用双曲空间原型来对异质客户端表示进行对齐。最后,我们还引入了FLYAWARE,这是第一个包含不利天气数据的无人机语义分割数据集。
https://arxiv.org/abs/2403.13762
This paper introduces CLIPSwarm, a new algorithm designed to automate the modeling of swarm drone formations based on natural language. The algorithm begins by enriching a provided word, to compose a text prompt that serves as input to an iterative approach to find the formation that best matches the provided word. The algorithm iteratively refines formations of robots to align with the textual description, employing different steps for "exploration" and "exploitation". Our framework is currently evaluated on simple formation targets, limited to contour shapes. A formation is visually represented through alpha-shape contours and the most representative color is automatically found for the input word. To measure the similarity between the description and the visual representation of the formation, we use CLIP [1], encoding text and images into vectors and assessing their similarity. Subsequently, the algorithm rearranges the formation to visually represent the word more effectively, within the given constraints of available drones. Control actions are then assigned to the drones, ensuring robotic behavior and collision-free movement. Experimental results demonstrate the system's efficacy in accurately modeling robot formations from natural language descriptions. The algorithm's versatility is showcased through the execution of drone shows in photorealistic simulation with varying shapes. We refer the reader to the supplementary video for a visual reference of the results.
本文介绍了CLIPSwarm算法,这是一种根据自然语言自动建模蚊群机群的新算法。算法首先通过丰富提供的单词,生成一个文本提示,作为迭代方法找到与提供单词最匹配的机群。然后,算法迭代地优化机器人的形式,使它们与文本描述对齐,采用不同的步骤进行“探索”和“利用”。我们的框架目前仅在简单的机群目标上进行评估,这些目标限制在轮廓形状上。通过alpha形状轮廓和自动找到输入单词最具有代表性的颜色,对机群的视觉表示进行表示。为了测量描述与视觉表示的相似度,我们使用CLIP[1],将文本和图像编码为向量并评估它们的相似性。然后,算法重新排列机群,在给定的无人机约束范围内更有效地视觉表示单词。最后,通过在无人机上分配控制动作,确保机器人的行为和无碰撞运动。实验结果证明了系统在准确建模自然语言描述的机器人集群方面的高效性。通过在photorealistic仿真中执行无人机演示,展示了算法的多样性。我们请读者参考补充视频,查看结果的视觉参考。
https://arxiv.org/abs/2403.13467
Swarm robots have sparked remarkable developments across a range of fields. While it is necessary for various applications in swarm robots, a fast and robust coordinate initialization in vision-based drone swarms remains elusive. To this end, our paper proposes a complete system to recover a swarm's initial relative pose on platforms with size, weight, and power (SWaP) constraints. To overcome limited coverage of field-of-view (FoV), the drones rotate in place to obtain observations. To tackle the anonymous measurements, we formulate a non-convex rotation estimation problem and transform it into a semi-definite programming (SDP) problem, which can steadily obtain global optimal values. Then we utilize the Hungarian algorithm to recover relative translation and correspondences between observations and drone identities. To safely acquire complete observations, we actively search for positions and generate feasible trajectories to avoid collisions. To validate the practicability of our system, we conduct experiments on a vision-based drone swarm with only stereo cameras and inertial measurement units (IMUs) as sensors. The results demonstrate that the system can robustly get accurate relative poses in real time with limited onboard computation resources. The source code is released.
群机器人领域引起了显著的发展。虽然对于各种应用来说必要的,但在基于视觉的无人机群中快速且健壮的坐标初始化仍然具有挑战性。为此,我们的论文提出了一个完整的系统,用于在具有大小、重量和功率(SWaP)约束的平台中恢复群机器人的初始相对姿态。为了克服视场(FoV)有限覆盖,无人机在原地旋转以获得观测。为了处理匿名测量,我们提出了一个非凸的旋转估计问题,并将其转化为一个半定形规划(SDP)问题,可以稳定地获得全局最优值。接着,我们利用匈牙利算法来恢复观测与无人机身份之间的相对位置和对应关系。为了安全地获取完整观测,我们积极搜索位置并生成可行的轨迹以避免碰撞。为了验证我们系统的可行性,我们在仅使用双目立体摄像头和惯性测量单元(IMUs)作为传感器的情况下对视觉为基础的无人机群进行实验。实验结果表明,在有限的上机计算资源的情况下,系统可以实时准确地获得相对姿态。源代码已发布。
https://arxiv.org/abs/2403.13455
The widespread adoption of quadrotors for diverse applications, from agriculture to public safety, necessitates an understanding of the aerodynamic disturbances they create. This paper introduces a computationally lightweight model for estimating the time-averaged magnitude of the induced flow below quadrotors in hover. Unlike related approaches that rely on expensive computational fluid dynamics (CFD) simulations or time-consuming empirical measurements, our method leverages classical theory from turbulent flows. By analyzing over 9 hours of flight data from drones of varying sizes within a large motion capture system, we show that the combined flow from all propellers of the drone is well-approximated by a turbulent jet. Through the use of a novel normalization and scaling, we have developed and experimentally validated a unified model that describes the mean velocity field of the induced flow for different drone sizes. The model accurately describes the far-field airflow in a very large volume below the drone which is difficult to simulate in CFD. Our model, which requires only the drone's mass, propeller size, and drone size for calculations, offers a practical tool for dynamic planning in multi-agent scenarios, ensuring safer operations near humans and optimizing sensor placements.
广泛应用于农业和公共安全等领域的四旋翼设备的采用,需要理解它们产生的气动干扰。本文介绍了一种计算轻量级的模型,用于估计在悬停时四旋翼下产生的平均诱导流的大小。与相关方法(依赖于昂贵的计算流体动力学(CFD)模拟或耗时的经验测量)相比,我们的方法利用了流体动力学中的经典理论。通过对一个大运动捕捉系统中的多台不同大小无人机超过9小时的飞行数据进行分析,我们证明了无人机所有螺旋桨的合流在大气边界下的平均速度场与湍流流非常接近。通过使用新颖的标准化和缩放,我们开发并实验验证了一个统一的模型,描述了不同无人机大小下的诱导流平均速度场。与需要计算器质量、螺旋桨尺寸和无人机大小的传统方法相比,我们的模型在描述无人机下方大体积下的远场气流方面非常准确。我们的模型仅需要计算器的质量,螺旋桨尺寸和无人机尺寸来进行计算,为多智能体场景提供了一个实用的动态规划工具,确保靠近人类的操作更安全,并优化传感器的位置。
https://arxiv.org/abs/2403.13321
Realizing consumer-grade drones that are as useful as robot vacuums throughout our homes or personal smartphones in our daily lives requires drones to sense, actuate, and respond to general scenarios that may arise. Towards this vision, we propose RASP, a modular and reconfigurable sensing and actuation platform that allows drones to autonomously swap onboard sensors and actuators in only 25 seconds, allowing a single drone to quickly adapt to a diverse range of tasks. RASP consists of a mechanical layer to physically swap sensor modules, an electrical layer to maintain power and communication lines to the sensor/actuator, and a software layer to maintain a common interface between the drone and any sensor module in our platform. Leveraging recent advances in large language and visual language models, we further introduce the architecture, implementation, and real-world deployments of a personal assistant system utilizing RASP. We demonstrate that RASP can enable a diverse range of useful tasks in home, office, lab, and other indoor settings.
意识到消费者级无人机在家庭或日常生活中与机器人吸尘器一样有用,需要无人机能够感知、操作并响应可能出现的通用场景。为了实现这一目标,我们提出了RASP,一个模块化和可重构的感知和操作平台,让无人机在仅25秒内自主交换车载传感器和执行器,使单个无人机能够快速适应各种任务。RASP由机械层、电气层和软件层组成。利用最近在自然语言和视觉语言模型方面的重大进展,我们进一步引入了使用RASP的个人助手系统的架构、实现和实际部署。我们证明了RASP可以 enabling a wide range of useful tasks in various indoor settings, including homes, offices, labs, and other indoor environments.
https://arxiv.org/abs/2403.12853
We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.
我们在视觉为基础、自主无人机竞速中结合了强化学习(RL)和模仿学习(IL)的有效性。我们关注于直接处理视觉输入,而无需显式状态估计。虽然RL通过尝试和错误学习复杂控制器,但在视觉输入的高维度性方面面临挑战。相反,IL在从视觉演示中学习方面表现出高效性,但其质量受限,且面临协变量漂移等问题。为了克服这些限制,我们提出了结合RL和IL优点的全新训练框架。我们的框架包括三个阶段:使用特权状态信息的教师策略的初始训练,使用IL将其策略蒸馏为学生的策略,以及针对性能约束的强化学习微调。我们对模拟和现实世界的实验结果表明,与其他方法(包括IL和RL)相比,我们的方法在仅使用视觉信息的情况下成功导航四旋翼飞行赛道。
https://arxiv.org/abs/2403.12203
Unmanned Aerial Vehicles (UAVs) are gaining popularity in civil and military applications. However, uncontrolled access to restricted areas threatens privacy and security. Thus, prevention and detection of UAVs are pivotal to guarantee confidentiality and safety. Although active scanning, mainly based on radars, is one of the most accurate technologies, it can be expensive and less versatile than passive inspections, e.g., object recognition. Dynamic vision sensors (DVS) are bio-inspired event-based vision models that leverage timestamped pixel-level brightness changes in fast-moving scenes that adapt well to low-latency object detection. This paper presents F-UAV-D (Fast Unmanned Aerial Vehicle Detector), an embedded system that enables fast-moving drone detection. In particular, we propose a setup to exploit DVS as an alternative to RGB cameras in a real-time and low-power configuration. Our approach leverages the high-dynamic range (HDR) and background suppression of DVS and, when trained with various fast-moving drones, outperforms RGB input in suboptimal ambient conditions such as low illumination and fast-moving scenes. Our results show that F-UAV-D can (i) detect drones by using less than <15 W on average and (ii) perform real-time inference (i.e., <50 ms) by leveraging the CPU and GPU nodes of our edge computer.
无人机(UAVs)在民用和军事应用中越来越受欢迎。然而,未受控制的访问受限制区域会威胁隐私和安全。因此,预防无人机(UAVs)的检测和检测是确保机密性和安全性的关键。尽管主动扫描,主要基于雷达,是最准确的无人机检测技术之一,但它可能昂贵且不如被动检测(例如物体识别)灵活。动态视觉传感器(DVS)是受生物启发的基于事件的视觉模型,它利用快速移动场景中时间戳级像素级的亮度变化来适应低延迟的物体检测。本文介绍了一种名为F-UAV-D的嵌入式系统,可实现快速移动无人机检测。特别,我们提出了一个利用DVS作为实时和低功耗配置中RGB摄像头的替代品的设置。我们的方法利用DVS的高动态范围(HDR)和背景抑制特性,并在各种快速移动无人机上训练时,在低照度和快速移动场景下优于RGB输入。我们的结果表明,F-UAV-D可以(i)通过平均使用能量低于<15瓦来检测无人机,(ii)通过利用边缘计算机的CPU和GPU节点进行实时推理(即<50毫秒)实现。
https://arxiv.org/abs/2403.11875
A critical challenge in deploying unmanned aerial vehicles (UAVs) for autonomous tasks is their ability to navigate in an unknown environment. This paper introduces a novel vision-depth fusion approach for autonomous navigation on nano-UAVs. We combine the visual-based PULP-Dronet convolutional neural network for semantic information extraction, i.e., serving as the global perception, with 8x8px depth maps for close-proximity maneuvers, i.e., the local perception. When tested in-field, our integration strategy highlights the complementary strengths of both visual and depth sensory information. We achieve a 100% success rate over 15 flights in a complex navigation scenario, encompassing straight pathways, static obstacle avoidance, and 90° turns.
在部署用于自主任务的无机飞行器(UAVs)时,一个关键的挑战是它们在未知环境中的导航能力。本文提出了一种用于自主导航的纳米UAVs的新视觉深度融合方法。我们结合了基于视觉的PULP-Dronet卷积神经网络(用于语义信息提取,即全局感知)和8x8像素的深度图(用于近距离操纵,即局部感知)。在实地测试中,我们的集成策略突出了视觉和深度感官信息的互补优势。我们在复杂导航场景中实现了100%的成功率,包括直线通道、静态障碍物躲避和90度转弯。
https://arxiv.org/abs/2403.11661