Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. Additionally, we modify the regression loss using Gaussian Wasserstein distance (GWD) to obtain high-quality bounding boxes. Deformable convolution and refinement methods are employed in the detection head to enhance the detection of small objects. We perform extensive experiments on two aerial image datasets, including Visdrone2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
从无人机图像中检测物体带来了相当大的挑战,因为以下原因:1)无人机图像通常具有非常大的尺寸,通常是数百万或甚至数千万像素,而计算资源有限。2)小物体尺寸导致有效检测信息不足。3)非均匀物体分布导致计算资源浪费。为解决这些问题,我们提出了YOLC(你只看聚类)框架,这是一个基于无锚定物体检测器,基于CenterNet的,有效且高效的框架。为了克服大规模图像和非均匀物体分布带来的挑战,我们引入了局部尺度模块(LSM),它动态地搜索聚类区域以进行精确检测。此外,我们还使用高斯瓦瑟夫距离(GWD)修改回归损失以获得高质量的边界框。在检测头部采用可变形卷积和优化方法来增强对小物体的检测。我们对两个无人机图像数据集(包括Visdrone2019和UAVDT)进行了广泛的实验,以证明我们提出方法的有效性和优越性。
https://arxiv.org/abs/2404.06180
Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms. However, achieving this goal still remains challenging, primarily due to: (i) For crowded scenes with occluded objects, the high overlap of object bounding boxes leads to confusion among closely located objects. Nevertheless, humans naturally perceive the depth of elements in a scene when observing 2D videos. Inspired by this, even though the bounding boxes of objects are close on the camera plane, we can differentiate them in the depth dimension, thereby establishing a 3D perception of the objects. (ii) For videos with rapidly irregular camera motion, abrupt changes in object positions can result in ID switches. However, if the camera pose are known, we can compensate for the errors in linear motion models. In this paper, we propose \textit{DepthMOT}, which achieves: (i) detecting and estimating scene depth map \textit{end-to-end}, (ii) compensating the irregular camera motion by camera pose estimation. Extensive experiments demonstrate the superior performance of DepthMOT in VisDrone-MOT and UAVDT datasets. The code will be available at \url{this https URL}.
准确地区分每个物体是多目标跟踪(MOT)算法的一个基本目标。然而,要实现这个目标仍然具有挑战性,主要原因如下:(i)在拥挤的场景中,物体边界框的高重叠会导致近距离物体之间的混淆。然而,当观察2D视频时,人类会自然地感知场景中元素的深度。受到这个启发,尽管在相机平面上,物体的边界框很接近,我们仍然可以在深度维度上区分它们,从而建立对物体的3D感知。(ii)对于快速不规则的相机运动视频,物体位置的突然变化可能导致ID切换。然而,如果已知相机姿态,我们可以通过估计线性运动模型的误差来补偿。在本文中,我们提出了深度MOT(DepthMOT),它实现了:(i)检测和估计场景深度图(end-to-end),(ii)通过相机姿态估计来补偿不规则相机运动。在VisDrone-MOT和UAVDT数据集上进行的大量实验证明,深度MOT在表现优异。代码将在此处公开可用:https://this URL。
https://arxiv.org/abs/2404.05518
The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments. This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera's field-of-view. We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
非直线光学(NLOS)成像的研究越来越多,因为其许多潜在应用,包括救援行动和自动驾驶汽车中的行人检测。然而,在运动相机上实现NLOS成像仍然是一个研究热点。现有的NLOS成像方法依赖于时间分辨率检测器和激光配置,需要精确的光学对齐,这使得它们难以在动态环境中部署。本文提出了一种数据驱动的NLOS成像方法,PathFinder,可用于安装在小型、受功率限制的移动机器人上的标准RGB相机,如无人机。我们的实验流程旨在准确估计在曼哈顿环境中的移动人员的2D轨迹,同时保持从相机视场范围外隐藏。我们引入了一种基于注意力的神经网络来处理序列动态连续帧的LOS视频的方法。该方法还包括一个预处理选择度量,用于分析运动相机中包含多个垂直平面表面(如墙和建筑立面)的图像,并提取返回最大NLOS信息的平面。我们在野外场景中使用无人机进行视频捕捉,从而验证了该方法,证明了在动态捕捉环境中低成本的NLOS成像。
https://arxiv.org/abs/2404.05024
Robots are being designed to help people in an increasing variety of settings--but seemingly little attention has been given so far to the specific needs of women, who represent roughly half of the world's population but are highly underrepresented in robotics. Here we used a speculative prototyping approach to explore this expansive design space: First, we identified some potential challenges of interest, including crimes and illnesses that disproportionately affect women, as well as potential opportunities for designers, which were visualized in five sketches. Then, one of the sketched scenarios was further explored by developing a prototype, of a robotic helper drone equipped with computer vision to detect hidden cameras that could be used to spy on women. While object detection introduced some errors, hidden cameras were identified with a reasonable accuracy of 80\% (Intersection over Union (IoU) score: 0.40). Our aim is that the identified challenges and opportunities could help spark discussion and inspire designers, toward realizing a safer, more inclusive future through responsible use of technology.
机器人在各种场景中帮助人的设计越来越普遍,但似乎迄今为止对女性具体需求的研究还很少。在这里,我们使用了一种speculative prototyping方法来探索这个广阔的设计空间:首先,我们识别出一些感兴趣的潜在挑战,包括对女性影响最大的犯罪和疾病,以及设计师可以关注到的潜在机会,这些机会在五个草图中被呈现出来。然后,针对其中一个草图场景,通过开发一个配备了计算机视觉的机器人助手无人机原型,进一步研究了如何利用摄像机进行窥探的问题。虽然物体检测引入了一些误差,但隐蔽摄像头的识别准确率相当高(交集 over 联合(IoU)得分:0.40)。我们的目标是,识别出的挑战和机会可以激发讨论,激发设计师们,从而通过负责任地使用科技,实现一个更安全、更包容的未来。
https://arxiv.org/abs/2404.04123
As robotic systems such as autonomous cars and delivery drones assume greater roles and responsibilities within society, the likelihood and impact of catastrophic software failure within those systems is this http URL aid researchers in the development of new methods to measure and assure the safety and quality of robotics software, we systematically curated a dataset of 221 bugs across 7 popular and diverse software systems implemented via the Robot Operating System (ROS). We produce historically accurate recreations of each of the 221 defective software versions in the form of Docker images, and use a grounded theory approach to examine and categorize their corresponding faults, failures, and fixes. Finally, we reflect on the implications of our findings and outline future research directions for the community.
随着机器人系统(如自动驾驶汽车和无人机送货)在社会中扮演越来越重要的角色,这些系统中灾难性软件故障的可能性和影响就显得尤为关键。为了帮助研究人员开发新的方法来测量和确保机器人软件的安全性和质量,我们系统地整理了一份包含7个流行且具有不同功能的机器人操作系统(ROS)实现中的221个 bug 的数据集。我们以历史准确的方式重新创建了每个221个有缺陷的软件版本,并使用 grounded theory 方法来研究和归类它们相应的故障、失败和修复。最后,我们反思了我们的发现,并为社区未来的研究方向勾画了轮廓。
https://arxiv.org/abs/2404.03629
Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.
传统的机器学习模型通常需要强大的硬件,这使得它们不适合在资源受限的设备上部署。Tiny Machine Learning (tinyML) 作为一种有前景的方法,为在资源受限的设备上运行机器学习模型提供了良好的途径。然而,将多个数据模态集成到 tinyML 模型中仍然具有挑战性,因为增加了复杂性、延迟和功耗。本文提出了一种名为 TinyVQA 的新颖的多模态深度神经网络,用于在资源受限的 tinyML 硬件上部署视觉问答任务。TinyVQA 利用监督注意力为基础的模型学习如何使用视觉和语言模态回答问题。从监督注意力为基础的 VQA 模型获得的蒸馏知识训练了内存感知紧凑型 TinyVQA 模型,并采用低位宽量化技术进一步压缩了模型,以适应部署在 tinyML 设备上。TinyVQA 模型在 FloodNet 数据集上进行了评估,该数据集用于灾害损失评估。紧凑型模型实现了 79.5% 的准确率,证明了 TinyVQA 在现实应用中的有效性。此外,该模型还部署在配备 AI 阵列和 GAP8 微处理器的疯狂飞行器 2.0 上。TinyVQA 模型在部署在 tiny无人机上时,具有低延迟(56ms)和低功耗(693mW),展示了其在资源受限嵌入式系统中的适用性。
https://arxiv.org/abs/2404.03574
Autonomous nano-drones (~10 cm in diameter), thanks to their ultra-low power TinyML-based brains, are capable of coping with real-world environments. However, due to their simplified sensors and compute units, they are still far from the sense-and-act capabilities shown in their bigger counterparts. This system paper presents a novel deep learning-based pipeline that fuses multi-sensorial input (i.e., low-resolution images and 8x8 depth map) with the robot's state information to tackle a human pose estimation task. Thanks to our design, the proposed system -- trained in simulation and tested on a real-world dataset -- improves a state-unaware State-of-the-Art baseline by increasing the R^2 regression metric up to 0.10 on the distance's prediction.
翻译: 自主纳米无人机(直径约10厘米),由于其基于极低功耗的TinyML大脑,能够应对真实世界环境。然而,由于其简单的传感器和计算单元,它们与较大 counterparts 相比还有很长的路要走。本文系统论文提出了一种新颖的深度学习-based 管道,将多感官输入(即低分辨率图像和8x8 深度图)与机器状态信息相结合,以解决人体姿态估计任务。感谢我们的设计,与在模拟环境中训练并在真实世界数据集上测试相比,所提出的系统 - 通过增加距离预测的 R^2 回归指标 - 提高了不知状态下的最先进基线的性能。
https://arxiv.org/abs/2404.02567
In this article, we explore the potential of zero-shot Large Multimodal Models (LMMs) in the domain of drone perception. We focus on person detection and action recognition tasks and evaluate two prominent LMMs, namely YOLO-World and GPT-4V(ision) using a publicly available dataset captured from aerial views. Traditional deep learning approaches rely heavily on large and high-quality training datasets. However, in certain robotic settings, acquiring such datasets can be resource-intensive or impractical within a reasonable timeframe. The flexibility of prompt-based Large Multimodal Models (LMMs) and their exceptional generalization capabilities have the potential to revolutionize robotics applications in these scenarios. Our findings suggest that YOLO-World demonstrates good detection performance. GPT-4V struggles with accurately classifying action classes but delivers promising results in filtering out unwanted region proposals and in providing a general description of the scenery. This research represents an initial step in leveraging LMMs for drone perception and establishes a foundation for future investigations in this area.
在本文中,我们探讨了在无人机感知领域中零击大型多模态模型的潜力。我们重点关注人物检测和动作识别任务,并使用从高空拍摄公开可用数据集来评估两个著名的LMM,即YOLO-World和GPT-4V(视觉)。传统的深度学习方法在很大程度上依赖于大型和高质量的训练数据集。然而,在某些机器人设置中,获取这类数据集可能具有资源密集性或不可行性。提示式大型多模态模型的灵活性和其卓越的泛化能力具有在这些场景中彻底改变机器人应用的可能性。我们的研究结果表明,YOLO-World在检测方面表现良好。GPT-4V在准确分类动作类别的方面表现不佳,但在过滤出不需要的区域建议和提供景观的一般描述方面带来了有前途的结果。这项研究代表了利用LMM进行无人机感知的第一步,为这个领域的未来研究奠定了基础。
https://arxiv.org/abs/2404.01571
Drones may be more advantageous than fixed cameras for quality control applications in industrial facilities, since they can be redeployed dynamically and adjusted to production planning. The practical scenario that has motivated this paper, image acquisition with drones in a car manufacturing plant, requires drone positioning accuracy in the order of 5 cm. During repetitive manufacturing processes, it is assumed that quality control imaging drones will follow highly deterministic periodic paths, stop at predefined points to take images and send them to image recognition servers. Therefore, by relying on prior knowledge about production chain schedules, it is possible to optimize the positioning technologies for the drones to stay at all times within the boundaries of their flight plans, which will be composed of stopping points and the paths in between. This involves mitigating issues such as temporary blocking of line-of-sight between the drone and any existing radio beacons; sensor data noise; and the loss of visual references. We present a self-corrective solution for this purpose. It corrects visual odometer readings based on filtered and clustered Ultra-Wide Band (UWB) data, as an alternative to direct Kalman fusion. The approach combines the advantages of these technologies when at least one of them works properly at any measurement spot. It has three method components: independent Kalman filtering, data association by means of stream clustering and mutual correction of sensor readings based on the generation of cumulative correction vectors. The approach is inspired by the observation that UWB positioning works reasonably well at static spots whereas visual odometer measurements reflect straight displacements correctly but can underestimate their length. Our experimental results demonstrate the advantages of the approach in the application scenario over Kalman fusion.
无人机在工业设施的质量控制应用中可能比固定相机更具优势,因为它们可以动态重新部署并根据生产计划进行调整。本文所描述的实际情景,即在汽车制造厂使用无人机进行图像采集,要求无人机定位精度达到5厘米。在重复生产过程中,假设质量控制成像无人机将遵循高度确定性的周期路径,在预定义的点停站拍照并将其发送到图像识别服务器。因此,通过依赖生产链时间表的先前知识,可以优化无人机的位置技术,使其始终在飞行计划的边界内,包括停站点和飞行路径之间。这包括减轻诸如无人机与现有无线信标之间视线阻塞的问题,传感器数据噪声以及视觉参考丢失等问题。我们提出了这种目的的自校正解决方案。它基于经过滤波和聚类的超宽带(UWB)数据进行视觉里程计读数的修正,作为直接Kalman融合的替代方案。该方法结合了这些技术的优点,只要至少有一个测量点上它们都能正常工作。它包括三个方法组件:独立Kalman滤波、通过流聚类进行数据关联以及根据累积校正向量基于传感器读数的相互校正。该方法源于观察到UWB定位在静态点上表现得相当好,而视觉里程计测量则准确反映直线位移,但可能低估其长度。我们的实验结果表明,在应用场景中,该方法相对于Kalman融合具有优势。
https://arxiv.org/abs/2404.00426
This project aims to revolutionize drone flight control by implementing a nonlinear Deep Reinforcement Learning (DRL) agent as a replacement for traditional linear Proportional Integral Derivative (PID) controllers. The primary objective is to seamlessly transition drones between manual and autonomous modes, enhancing responsiveness and stability. We utilize the Proximal Policy Optimization (PPO) reinforcement learning strategy within the Gazebo simulator to train the DRL agent. Adding a $20,000 indoor Vicon tracking system offers <1mm positioning accuracy, which significantly improves autonomous flight precision. To navigate the drone in the shortest collision-free trajectory, we also build a 3 dimensional A* path planner and implement it into the real flight successfully.
本项目旨在通过实现一个非线性深度强化学习(DRL)代理来颠覆无人机飞行控制,用来说明传统的线性比例微分(PID)控制器。主要目标是使无人机无缝地在手动和自动驾驶模式之间转换,提高反应性和稳定性。我们在Gazebo仿真器中利用Proximal Policy Optimization(PPO)强化学习策略来训练DRL代理。增加一个20,000美元的室内Vicon跟踪系统提供了<1mm的定位精度,这显著提高了自主飞行的精确度。为了在最近的碰撞避免轨迹中引导无人机,我们还构建了一个3维A*路径规划器,并成功地将其融入到实际飞行中。
https://arxiv.org/abs/2404.00204
The goal of field boundary delineation is to predict the polygonal boundaries and interiors of individual crop fields in overhead remotely sensed images (e.g., from satellites or drones). Automatic delineation of field boundaries is a necessary task for many real-world use cases in agriculture, such as estimating cultivated area in a region or predicting end-of-season yield in a field. Field boundary delineation can be framed as an instance segmentation problem, but presents unique research challenges compared to traditional computer vision datasets used for instance segmentation. The practical applicability of previous work is also limited by the assumption that a sufficiently-large labeled dataset is available where field boundary delineation models will be applied, which is not the reality for most regions (especially under-resourced regions such as Sub-Saharan Africa). We present an approach for segmentation of crop field boundaries in satellite images in regions lacking labeled data that uses multi-region transfer learning to adapt model weights for the target region. We show that our approach outperforms existing methods and that multi-region transfer learning substantially boosts performance for multiple model architectures. Our implementation and datasets are publicly available to enable use of the approach by end-users and serve as a benchmark for future work.
野外边界勾勒的目标是预测覆盖遥感图像(如卫星或无人机)中单个农田的多边形边界和内部。自动划分田野边界对于许多农业现实应用场景(如估算地区耕地面积或预测田地收获量)是必要的。将野外边界勾勒视为实例分割问题,但与用于实例分割的传统计算机视觉数据集相比,它呈现出了独特的研究挑战。以前工作的实用性也受到假设足够大的有标签数据集存在的限制,该数据集将用于应用田野边界分割模型,这在大多数地区并不现实(尤其是在资源相对匮乏的地区,如撒哈拉以南非洲地区)。我们提出了一个在缺乏有标签数据集的地区分割卫星图像中作物农田边界的分割方法,利用多区域迁移学习来适应目标区域。我们证明了我们的方法超越了现有方法,多区域迁移学习在多个模型架构上显著提高了性能。我们的实现和数据集都是公开的,以便用户使用,并作为未来工作的基准。
https://arxiv.org/abs/2404.00179
Legal autonomy - the lawful activity of artificial intelligence agents - can be achieved in one of two ways. It can be achieved either by imposing constraints on AI actors such as developers, deployers and users, and on AI resources such as data, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment. The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices (e.g., encoding rules about limitations on zones of operations into the agent software of an autonomous drone device). This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable, and that would enable AI agents to reason about the law. In this paper, we sketch a proof of principle for such a method using large language models (LLMs), expert legal systems known as legal decision paths, and Bayesian networks. We then show how the proposed method could be applied to extant regulation in matters of autonomous cars, such as the California Vehicle Code.
法律自主权 - 人工智能代理的合法活动 - 可以通过两种方式实现。这可以通过对人工智能代理开发者、部署者和用户施加限制,以及对人工智能资源如数据施加限制来实现。后一种方法涉及将关于人工智能驱动设备的现有规则编码到控制这些设备的AI代理软件中(例如,将操作区域限制规则编码到自主无人机设备的代理软件中)。这种方法具有挑战性,因为实现这种方法需要一种提取、加载、转换和计算法律信息的可解释且具有法律可互操作性的方法,这将使AI代理能够推理法律。在本文中,我们用大规模语言模型(LLMs)、著名的法律决策路径(专家法律系统)和贝叶斯网络来阐述这种方法的一个原则。然后,我们展示了如何将该方法应用于自动驾驶汽车领域的现有法规,例如加州车辆代码。
https://arxiv.org/abs/2403.18537
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely on predicting future scenes (video frames), suffering from the neglect of explicitly modeling target's motion states, which is crucial for aerial video interpretation. To address this issue, we introduce a novel task called Target-Aware Aerial Video Prediction, aiming to simultaneously predict future scenes and motion states of the target. Further, we design a model specifically for this task, named TAFormer, which provides a unified modeling approach for both video and target motion states. Specifically, we introduce Spatiotemporal Attention (STA), which decouples the learning of video dynamics into spatial static attention and temporal dynamic attention, effectively modeling the scene appearance and motion. Additionally, we design an Information Sharing Mechanism (ISM), which elegantly unifies the modeling of video and target motion by facilitating information interaction through two sets of messenger tokens. Moreover, to alleviate the difficulty of distinguishing targets in blurry predictions, we introduce Target-Sensitive Gaussian Loss (TSGL), enhancing the model's sensitivity to both target's position and content. Extensive experiments on UAV123VP and VisDroneVP (derived from single-object tracking datasets) demonstrate the exceptional performance of TAFormer in target-aware video prediction, showcasing its adaptability to the additional requirements of aerial video interpretation for target awareness.
随着无人机技术的进步,使用无人机进行航空测量已成为现代低空遥感的优势趋势。高空视频数据的激增迫使对未来场景和感兴趣目标的动态状态进行准确预测,特别是在交通管理和灾害应对等领域。现有的视频预测方法仅关注预测未来场景(视频帧),忽略了明确建模目标运动状态,这是高空视频解释的关键。为解决这个问题,我们引入了一个名为 Target-Aware Aerial Video Prediction 的新任务,旨在同时预测未来场景和目标的动态状态。此外,我们为这个任务设计了一个名为 TAFormer 的模型,提供了一种统一建模视频和目标运动状态的方法。具体来说,我们引入了 Spatiotemporal Attention(STA),将视频动态学习的空间静态注意力和时间动态注意力解耦,有效建模场景外观和运动。此外,我们设计了一个信息共享机制(ISM),通过促进信息交互来统一建模视频和目标运动。为了减轻在模糊预测中区分目标的努力,我们引入了 Target-Sensitive Gaussian Loss(TSGL),提高了模型对目标位置和内容的敏感度。对于 UAV123VP 和 VisDroneVP(源于单对象跟踪数据集)的实验表明,TAFormer 在目标意识视频预测方面的表现异常出色,展示了它对空中视频解释额外需求的适应能力。
https://arxiv.org/abs/2403.18238
Automating the current bridge visual inspection practices using drones and image processing techniques is a prominent way to make these inspections more effective, robust, and less expensive. In this paper, we investigate the development of a novel deep-learning method for the detection of fatigue cracks in high-resolution images of steel bridges. First, we present a novel and challenging dataset comprising of images of cracks in steel bridges. Secondly, we integrate the ConvNext neural network with a previous state- of-the-art encoder-decoder network for crack segmentation. We study and report, the effects of the use of background patches on the network performance when applied to high-resolution images of cracks in steel bridges. Finally, we introduce a loss function that allows the use of more background patches for the training process, which yields a significant reduction in false positive rates.
使用无人机和图像处理技术自动化当前的桥梁视觉检查做法是一种有效、稳健且成本较低的方法。在本文中,我们研究了用于检测钢桥高分辨率图像中疲劳裂纹的新颖深度学习方法的发展。首先,我们提出了一个由钢桥裂纹图像组成的全新有挑战性的数据集。其次,我们将ConvNext神经网络与之前的最先进的编码器-解码器网络相结合进行裂纹分割。我们研究并报道了将背景补丁应用于高分辨率裂纹钢桥图像时对网络性能的影响。最后,我们引入了一种允许在训练过程中使用更多背景补丁的损失函数,从而显著降低 false positive 率。
https://arxiv.org/abs/2403.17725
Time-optimal quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a leading model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi-objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a constraint and terminal set. The safety set is designed as a spatial constraint which prevents gate collisions while allowing for time-optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state of the art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best state-of-the-art RL and outperforms the best time-optimal controller while satisfying constraints. In both simulation and real-world, our approach consistently prevents gate crashes with 100\% success rate, while pushing the quadrotor to its physical limit reaching speeds of more than 80km/h.
时间最优的 quadrotor 飞行是一个极其具有挑战性的问题,因为在手柄的极限处遇到的控制权限非常有限。为了实现时间优化的无人机竞赛,模型预测控制(MPC)作为一种基于模型的方法已经成为了领先的模式。然而,在无人机竞赛中使用的标准 MPCC 形式在成本函数中引入了门的概念,导致目标函数连续地平衡在最大化进步和精确跟踪路径之间。本文介绍了三个关键组件,增强了在无人机竞赛中使用 MPC 的方法。首先,我们提供了安全保证,以约束和终止集的形式提供保障。安全集被设计为空间约束,在防止门碰撞的同时,仅允许在成本函数中进行时间优化。其次,我们通过残差项增加了现有的第一性原理动力学,并捕捉了从现实世界数据中获得的复杂空气动力学效应和推力。第三,我们使用了一种最先进的全球贝叶斯优化算法——Trust Region Bayesian Optimization (TuRBO)来根据基于 lap 时间最小化的稀疏奖励来调整 MPC 控制器的超参数。所提出的方法在类似 lap 时间内取得了与最佳状态下的 RL 相同的速度,并且在满足约束的情况下超越了最佳时间最优控制器。在仿真和现实世界里,我们的方法始终能够以 100% 的成功率防止门碰撞,并将无人机推向其物理极限,达到超过 80km/h 的速度。
https://arxiv.org/abs/2403.17551
The formation trajectory planning using complete graphs to model collaborative constraints becomes computationally intractable as the number of drones increases due to the curse of dimensionality. To tackle this issue, this paper presents a sparse graph construction method for formation planning to realize better efficiency-performance trade-off. Firstly, a sparsification mechanism for complete graphs is designed to ensure the global rigidity of sparsified graphs, which is a necessary condition for uniquely corresponding to a geometric shape. Secondly, a good sparse graph is constructed to preserve the main structural feature of complete graphs sufficiently. Since the graph-based formation constraint is described by Laplacian matrix, the sparse graph construction problem is equivalent to submatrix selection, which has combinatorial time complexity and needs a scoring metric. Via comparative simulations, the Max-Trace matrix-revealing metric shows the promising performance. The sparse graph is integrated into the formation planning. Simulation results with 72 drones in complex environments demonstrate that when preserving 30\% connection edges, our method has comparative formation error and recovery performance w.r.t. complete graphs. Meanwhile, the planning efficiency is improved by approximate an order of magnitude. Benchmark comparisons and ablation studies are conducted to fully validate the merits of our method.
使用完整图模型建模协同约束的轨迹规划变得计算复杂,因为维度诅咒。为解决这一问题,本文提出了一种稀疏图构建方法,以实现更好的效率-性能权衡。首先,设计了一个稀疏化机制,以确保稀疏化图的全局刚度,这是稀疏化图形与几何形状唯一对应的一个必要条件。其次,为了保留完整图形的主要结构特征,构建了一个良好的稀疏图。由于基于图形的规划约束用拉普拉斯矩阵表示,稀疏图构建问题等价于子矩阵选择,具有组合时间复杂度和需要评分指标。通过比较仿真,Max-Trace矩阵揭示 metric显示出有前景的性能。稀疏图被整合到轨迹规划中。在复杂环境中,具有 72 个无人机的仿真结果表明,在保留 30% 的连接边时,我们的方法具有与完整图形相当的组建误差和恢复性能。同时,通过近似 order of magnitude 的规划效率得到了提高。通过基准比较和消融研究,全面验证了我们的方法的优点。
https://arxiv.org/abs/2403.17288
Nano-drones, distinguished by their agility, minimal weight, and cost-effectiveness, are particularly well-suited for exploration in confined, cluttered and narrow spaces. Recognizing transparent, highly reflective or absorbing materials, such as glass and metallic surfaces is challenging, as classical sensors, such as cameras or laser rangers, often do not detect them. Inspired by bats, which can fly at high speeds in complete darkness with the help of ultrasound, this paper introduces \textit{BatDeck}, a pioneering sensor-deck employing a lightweight and low-power ultrasonic sensor for nano-drone autonomous navigation. This paper first provides insights about sensor characteristics, highlighting the influence of motor noise on the ultrasound readings, then it introduces the results of extensive experimental tests for obstacle avoidance (OA) in a diverse environment. Results show that \textit{BatDeck} allows exploration for a flight time of 8 minutes while covering 136m on average before crash in a challenging environment with transparent and reflective obstacles, proving the effectiveness of ultrasonic sensors for OA on nano-drones.
纳米无人机以其敏捷性、轻便性和经济性脱颖而出,特别适合在狭小、杂乱和拥挤的空间中进行探索。意识到透明、高度反射或吸收材料的经典传感器(如相机或激光雷达)往往无法检测它们。受到蝙蝠启发,这些生物在完全黑暗中借助超声波可以高速飞行,本文引入了\textit{BatDeck},这是一款采用轻量化、低功耗超声波传感器为纳米无人机自主导航的开创性传感器阵列。本文首先提供了关于传感器特性的见解,强调了电机噪音对超声读数的影响,然后介绍了在各种环境中进行避开障碍(OA)测试的结果。结果表明,\textit{BatDeck}在具有透明和反射性障碍物的挑战环境中,可以让飞行时间延长至8分钟,证明超声传感器在纳米无人机OA方面的有效性。
https://arxiv.org/abs/2403.16696
This paper proposes an Emergency Battery Service (EBS) for drones in which an EBS drone flies to a drone in the field with a depleted battery and transfers a fresh battery to the exhausted drone. The authors present a unique battery transfer mechanism and drone localization that uses the Cross Marker Position (CMP) method. The main challenges include a stable and balanced transfer that precisely localizes the receiver drone. The proposed EBS drone mitigates the effects of downwash due to the vertical proximity between the drones by implementing diagonal alignment with the receiver, reducing the distance to 0.5 m between the two drones. CFD analysis shows that diagonal instead of perpendicular alignment minimizes turbulence, and the authors verify the actual system for change in output airflow and thrust measurements. The CMP marker-based localization method enables position lock for the EBS drone with up to 0.9 cm accuracy. The performance of the transfer mechanism is validated experimentally by successful mid-air transfer in 5 seconds, where the EBS drone is within 0.5 m vertical distance from the receiver drone, wherein 4m/s turbulence does not affect the transfer process.
本文提出了一种针对无人机的紧急电池服务(EBS)方案,其中EBS无人机会飞向场中的一架耗尽电池的无人机,并将其充满电的电池传输给耗尽电池的无人机。作者提出了一个独特的电池传输机制和无人机定位方法,利用交叉标记位置(CMP)方法。主要挑战包括稳定和平衡的传输,精确地将接收无人机的位置确定下来。所提出的EBS无人机通过将机身与接收机无人机对齐,减小垂直距离,从而减轻了俯冲效应。CFD分析表明,相对于垂直或平行对齐,横着对齐可以最小化湍流,并且作者验证了输出空气流和推力的变化。基于CMP标记的定位方法使得EBS无人机的定位精度可以达到0.9厘米。通过成功的中间空中转移5秒钟来验证传输机制的性能,其中EBS无人机距离接收机无人机约0.5米,而4米/秒的湍流并没有影响传输过程。
https://arxiv.org/abs/2403.16430
This paper addresses the problem of target search and tracking using a fleet of cooperating UAVs evolving in some unknown region of interest containing an a priori unknown number of moving ground targets. Each drone is equipped with an embedded Computer Vision System (CVS), providing an image with labeled pixels and a depth map of the observed part of its environment. Moreover, a box containing the corresponding pixels in the image frame is available when a UAV identifies a target. Hypotheses regarding information provided by the pixel classification, depth map construction, and target identification algorithms are proposed to allow its exploitation by set-membership approaches. A set-membership target location estimator is developed using the information provided by the CVS. Each UAV evaluates sets guaranteed to contain the location of the identified targets and a set possibly containing the locations of targets still to be identified. Then, each UAV uses these sets to search and track targets cooperatively.
本文解决了使用一组协同的无人机进行目标搜索和跟踪的问题,这些无人机在一个包含已知数量移动地面目标但具体情况不明的兴趣区域中进化。每个无人机都配备了一个嵌入的计算机视觉系统(CVS),提供了带有标签的像素图像和观察其环境部分深度的图。此外,当无人机识别到一个目标时,可以在图像帧中提供相应像素的盒子。关于像素分类、深度图构建和目标识别算法提供的信息的假设提出了,以便其被集合成员制方法所利用。使用CVS提供的信息,开发了一个集合成员制目标位置估计器。每个无人机评估包含已识别目标的位置和可能包含尚未识别目标位置的集合。然后,每个无人机使用这些集合来与其他无人机协同搜索和跟踪目标。
https://arxiv.org/abs/2403.15113
In this paper, we explore the application of Unmanned Aerial Vehicles (UAVs) in maritime search and rescue (mSAR) missions, focusing on medium-sized fixed-wing drones and quadcopters. We address the challenges and limitations inherent in operating some of the different classes of UAVs, particularly in search operations. Our research includes the development of a comprehensive software framework designed to enhance the efficiency and efficacy of SAR operations. This framework combines preliminary detection onboard UAVs with advanced object detection at ground stations, aiming to reduce visual strain and improve decision-making for operators. It will be made publicly available upon publication. We conduct experiments to evaluate various Region of Interest (RoI) proposal methods, especially by imposing simulated limited bandwidth on them, an important consideration when flying remote or offshore operations. This forces the algorithm to prioritize some predictions over others.
在本文中,我们探讨了在海上搜索和救助(mSAR)任务中应用无人机(UAVs)的应用,重点关注中大型固定翼无人机和四旋翼。我们解决了操作不同种类UAV固有的挑战和局限性,特别是搜索操作。我们的研究包括开发一个全面软件框架,旨在提高SAR操作的效率和效果。该框架将无人机上初步的检测与地面站的高级物体检测相结合,旨在减少视觉疲劳并提高操作者的决策能力。在发表后,该框架将公开可用。我们进行实验来评估各种兴趣区域(RoI)提案方法,尤其是通过模拟有限的带宽来对待它们,这是在飞行远程或海上操作时需要考虑的重要问题。这迫使算法优先考虑某些预测而不是其他预测。
https://arxiv.org/abs/2403.14281