This paper presents a novel control strategy for drone networks to improve the quality of 3D structures reconstructed from aerial images by drones. Unlike the existing coverage control strategies for this purpose, our proposed approach simultaneously controls both the camera orientation and drone translational motion, enabling more comprehensive perspectives and enhancing the map's overall quality. Subsequently, we present a novel problem formulation, including a new performance function to evaluate the drone positions and camera orientations. We then design a QP-based controller with a control barrier-like function for a constraint on the decay rate of the objective function. The present problem formulation poses a new challenge, requiring significantly greater computational efforts than the case involving only translational motion control. We approach this issue technologically, namely by introducing JAX, utilizing just-in-time (JIT) compilation and Graphical Processing Unit (GPU) acceleration. We finally conduct extensive verifications through simulation in ROS (Robot Operating System) and show the real-time feasibility of the controller and the superiority of the present controller to the conventional method.
本文提出了一种新的无人机网络控制策略,旨在通过无人机从高空图像中重构3D结构来提高无人机生成的3D结构的质量。与现有的覆盖控制策略不同,我们的方法同时控制摄像机方向和无人机平移运动,使得无人机可以获得更全面的视角,并提高地图的整体质量。接着,我们提出了一个新问题陈述,包括一个新的性能函数来评估无人机的位置和摄像机方向。然后,我们设计了一个基于QP的控制器,该控制器具有类似于控制壁垒的功能,用于约束目标函数的衰减率。当前问题陈述提出了一个新的挑战,需要比仅涉及平移运动控制的案例更大的计算努力。我们通过技术方法来解决这个问题,即通过引入JAX、即时编译和图形处理器(GPU)加速来利用。最后,我们在ROS(机器人操作系统)中进行广泛的仿真验证,证明了控制器和现有方法的优势。
https://arxiv.org/abs/2404.13915
Current methods for 3D reconstruction and environmental mapping frequently face challenges in achieving high precision, highlighting the need for practical and effective solutions. In response to this issue, our study introduces FlyNeRF, a system integrating Neural Radiance Fields (NeRF) with drone-based data acquisition for high-quality 3D reconstruction. Utilizing unmanned aerial vehicle (UAV) for capturing images and corresponding spatial coordinates, the obtained data is subsequently used for the initial NeRF-based 3D reconstruction of the environment. Further evaluation of the reconstruction render quality is accomplished by the image evaluation neural network developed within the scope of our system. According to the results of the image evaluation module, an autonomous algorithm determines the position for additional image capture, thereby improving the reconstruction quality. The neural network introduced for render quality assessment demonstrates an accuracy of 97%. Furthermore, our adaptive methodology enhances the overall reconstruction quality, resulting in an average improvement of 2.5 dB in Peak Signal-to-Noise Ratio (PSNR) for the 10% quantile. The FlyNeRF demonstrates promising results, offering advancements in such fields as environmental monitoring, surveillance, and digital twins, where high-fidelity 3D reconstructions are crucial.
目前用于3D建模和环境建模的方法通常很难实现高精度,这凸显了需要实际有效的解决方案。为了应对这个问题,我们的研究引入了FlyNeRF,一种将神经辐射场(NeRF)与无人机数据采集相结合的高质量3D建模系统。利用无人机捕获图像和相关空间坐标,然后将获得的数据用于环境中的最初NeRF-based 3D建模。通过系统内图像评估神经网络进一步评估建模渲染质量。根据图像评估模块的结果,自适应算法确定附加图像捕捉的位置,从而提高建模质量。用于建模质量评估的神经网络表现出97%的准确度。此外,我们的自适应方法提高了整体建模质量,使得10%分位数上的峰值信号-噪声比(PSNR)平均提高了2.5分贝。FlyNeRF显示出鼓舞人心的结果,为环境监测、监视和数字孪生等领域提供了进步,这些领域对高保真3D建模至关重要。
https://arxiv.org/abs/2404.12970
Existing neural radiance fields (NeRF)-based novel view synthesis methods for large-scale outdoor scenes are mainly built on a single altitude. Moreover, they often require a priori camera shooting height and scene scope, leading to inefficient and impractical applications when camera altitude changes. In this work, we propose an end-to-end framework, termed AG-NeRF, and seek to reduce the training cost of building good reconstructions by synthesizing free-viewpoint images based on varying altitudes of scenes. Specifically, to tackle the detail variation problem from low altitude (drone-level) to high altitude (satellite-level), a source image selection method and an attention-based feature fusion approach are developed to extract and fuse the most relevant features of target view from multi-height images for high-fidelity rendering. Extensive experiments demonstrate that AG-NeRF achieves SOTA performance on 56 Leonard and Transamerica benchmarks and only requires a half hour of training time to reach the competitive PSNR as compared to the latest BungeeNeRF.
现有的基于神经辐射场(NeRF)的大规模户外场景的新型视图合成方法主要基于单个高度。此外,它们通常需要先验相机拍摄高度和场景范围,导致当相机高度发生变化时,应用变得低效和不实际。在本文中,我们提出了一个端到端的框架,称为AG-NeRF,旨在通过根据场景不同高度合成自由视点图像来降低构建良好重构的成本。具体来说,为了解决从低高度(无人机水平)到高高度(卫星水平)的详细变化问题,我们开发了源图像选择方法和一种基于注意力的特征融合方法,以提取和融合目标视图的高保真度渲染中最相关的特征。大量实验证明,AG-NeRF在56个Leonard和Transamerica基准测试中的性能达到最佳,与最新的BungeeNeRF相比,训练时间仅为一半小时。
https://arxiv.org/abs/2404.11897
Crop biomass, a critical indicator of plant growth, health, and productivity, is invaluable for crop breeding programs and agronomic research. However, the accurate and scalable quantification of crop biomass remains inaccessible due to limitations in existing measurement methods. One of the obstacles impeding the advancement of current crop biomass prediction methodologies is the scarcity of publicly available datasets. Addressing this gap, we introduce a new dataset in this domain, i.e. Multi-modality dataset for crop biomass estimation (MMCBE). Comprising 216 sets of multi-view drone images, coupled with LiDAR point clouds, and hand-labelled ground truth, MMCBE represents the first multi-modality one in the field. This dataset aims to establish benchmark methods for crop biomass quantification and foster the development of vision-based approaches. We have rigorously evaluated state-of-the-art crop biomass estimation methods using MMCBE and ventured into additional potential applications, such as 3D crop reconstruction from drone imagery and novel-view rendering. With this publication, we are making our comprehensive dataset available to the broader community.
农作物生物质,作为植物生长、健康状况和生产力的关键指标,对于农作物育种项目和农业研究具有巨大的价值。然而,由于现有测量方法的局限性,准确且可扩展地量化农作物生物质的仍然无法实现。阻碍当前农作物生物质预测方法进步的一个障碍是公共数据资源的稀少。为了解决这一缺口,我们在该领域引入了一个新的数据集,即多模态生物质估计数据集(MMCBE)。MMCBE由216个多视角无人机图像组成,与激光雷达点云和手动标注的地面真实数据集相结合,代表了该领域中的第一个多模态数据集。这个数据集旨在为农作物生物质计量建立基准方法,并推动基于视觉方法的开发。我们使用MMBCE对最先进的农作物生物质估计方法进行了严格的评估,并探索了其他潜在应用,如从无人机影像进行的三维农作物重建和基于新视角渲染。通过这一出版物,我们将全面的 dataset 提供给更广泛的社区。
https://arxiv.org/abs/2404.11256
In recent years, reports of illegal drones threatening public safety have increased. For the invasion of fully autonomous drones, traditional methods such as radio frequency interference and GPS shielding may fail. This paper proposes a scheme that uses an autonomous multicopter with a strapdown camera to intercept a maneuvering intruder UAV. The interceptor multicopter can autonomously detect and intercept intruders moving at high speed in the air. The strapdown camera avoids the complex mechanical structure of the electro-optical pod, making the interceptor multicopter compact. However, the coupling of the camera and multicopter motion makes interception tasks difficult. To solve this problem, an Image-Based Visual Servoing (IBVS) controller is proposed to make the interception fast and accurate. Then, in response to the time delay of sensor imaging and image processing relative to attitude changes in high-speed scenarios, a Delayed Kalman Filter (DKF) observer is generalized to predict the current image position and increase the update frequency. Finally, Hardware-in-the-Loop (HITL) simulations and outdoor flight experiments verify that this method has a high interception accuracy and success rate. In the flight experiments, a high-speed interception is achieved with a terminal speed of 20 m/s.
近年来,关于无人机威胁公共安全报道增多。对于完全自主无人机入侵,传统方法如无线电干扰和GPS屏蔽可能失效。本文提出了一种使用带下摄像头的自主多旋翼拦截机动入侵者的计划。拦截器多旋翼可以自主检测并拦截高速飞行中的入侵者。带下相机避免了光电套件的复杂机械结构,使得拦截器多旋翼紧凑。然而,相机和多旋翼的运动耦合使得拦截任务困难。为了解决这个问题,基于图像的视觉伺服器(IBVS)控制器被提出,以实现快速和准确的拦截。然后,根据高速场景中传感器成像和图像处理的时间延迟,扩展了延迟卡尔曼滤波器(DKF)观察者,以预测当前图像位置并增加更新频率。最后,硬件在环(HITL)仿真和户外飞行实验证实了这种方法具有较高的拦截准确性和成功率。在飞行实验中,使用终端速度为20 m/s的高速拦截获得了成功。
https://arxiv.org/abs/2404.08296
Multi-robot target tracking finds extensive applications in different scenarios, such as environmental surveillance and wildfire management, which require the robustness of the practical deployment of multi-robot systems in uncertain and dangerous environments. Traditional approaches often focus on the performance of tracking accuracy with no modeling and assumption of the environments, neglecting potential environmental hazards which result in system failures in real-world deployments. To address this challenge, we investigate multi-robot target tracking in the adversarial environment considering sensing and communication attacks with uncertainty. We design specific strategies to avoid different danger zones and proposed a multi-agent tracking framework under the perilous environment. We approximate the probabilistic constraints and formulate practical optimization strategies to address computational challenges efficiently. We evaluate the performance of our proposed methods in simulations to demonstrate the ability of robots to adjust their risk-aware behaviors under different levels of environmental uncertainty and risk confidence. The proposed method is further validated via real-world robot experiments where a team of drones successfully track dynamic ground robots while being risk-aware of the sensing and/or communication danger zones.
多机器人目标跟踪在不同的场景中具有广泛的应用,如环境监视和野火管理,这些场景需要多机器人系统在不确定和危险的环境中的实际部署的稳健性。传统的解决方案通常关注跟踪准确性的性能,没有对环境进行建模和假设,忽视了在现实部署中导致系统故障的潜在环境危险。为了应对这个挑战,我们研究在不确定环境中进行多机器人目标跟踪,考虑了感知和通信攻击的不确定性。我们设计了一些特定的策略来避开不同的危险区域,并针对危险环境提出了一种多代理跟踪框架。我们近似概率约束并提出了有效的优化策略来应对计算挑战。我们在仿真中评估了我们提出方法的性能,以证明机器人在不同环境不确定性和风险信心水平下调整其风险意识行为的能力。所提出的方法通过现实世界的机器人实验进一步验证,一支无人机团队在考虑感知和/或通信危险区域的情况下成功跟踪了动态地面机器人。
https://arxiv.org/abs/2404.07880
Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. Nevertheless, edge-based relations can be vulnerable under perturbations. To alleviate these issues, we propose a region-based relation learning paradigm that models social interactions via region-wise dynamics of joint states, i.e., the changes in the density of crowds. In particular, region-wise agent joint information is encoded within convolutional feature grids. Social relations are modeled by relating the temporal changes of local joint information from a global perspective. We show that region-based relations are less susceptible to perturbations. In order to account for the stochastic individual goals, we exploit a conditional variational autoencoder to realize multi-goal estimation and diverse future prediction. Specifically, we perform variational inference via the latent distribution, which is conditioned on the correlation between input states and associated target goals. Sampling from the latent distribution enables the framework to reliably capture the stochastic behavior in test data. We integrate multi-goal estimation and region-based relation learning to model the two stimuli, social interactions, and stochastic goals, in a prediction framework. We evaluate our framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that the diverse prediction better fits the ground truth when incorporating the relation module. Our framework outperforms the state-of-the-art models on SDD by $27.61\%$/$18.20\%$ of ADE/FDE metrics.
预测交通场景中的人轨迹对于混合或完全自动驾驶系统中的安全性至关重要。人的未来轨迹由社交互动和随机目标两个主要刺激驱动。因此,可靠的预测需要捕捉这两个刺激。基于边缘的关系建模使用精确个体状态的成对相关来表示社交互动。然而,边缘关系在扰动下可能变得脆弱。为了减轻这些问题,我们提出了一个基于区域的关联学习范式,通过联合状态的局部动态来建模社交互动,即人流的密度变化。特别地,区域间的代理器联合信息编码在卷积特征网格中。社交关系通过从全局角度描述局部联合信息的时间变化来建模。我们证明了基于区域的关系对扰动具有较强的鲁棒性。为了考虑随机个人目标,我们利用条件随机变分自编码器实现多目标估计和多样未来预测。具体来说,我们通过条件分布进行元规划推理,该分布与输入状态和相关的目标之间的相关性条件。从条件分布中采样使得预测框架能够可靠地捕捉测试数据的随机行为。我们将多目标估计和基于区域的关系学习相结合,建模两个刺激,社交互动和随机目标,在预测框架中。我们在ETH-UCY数据集和斯坦福无人机数据集(SDD)上评估我们的框架。我们发现,引入关系模块后,多样预测更贴近地面真实值。我们的框架在SDD上的性能比最先进的模型提高了$27.61\%$/$18.20\%$的ADE/FDE指标。
https://arxiv.org/abs/2404.06971
FlameFinder is a deep metric learning (DML) framework designed to accurately detect flames, even when obscured by smoke, using thermal images from firefighter drones during wildfire monitoring. Traditional RGB cameras struggle in such conditions, but thermal cameras can capture smoke-obscured flame features. However, they lack absolute thermal reference points, leading to false this http URL address this issue, FlameFinder utilizes paired thermal-RGB images for training. By learning latent flame features from smoke-free samples, the model becomes less biased towards relative thermal gradients. In testing, it identifies flames in smoky patches by analyzing their equivalent thermal-domain distribution. This method improves performance using both supervised and distance-based clustering metrics.The framework incorporates a flame segmentation method and a DML-aided detection framework. This includes utilizing center loss (CL), triplet center loss (TCL), and triplet cosine center loss (TCCL) to identify optimal cluster representatives for classification. However, the dominance of center loss over the other losses leads to the model missing features sensitive to them. To address this limitation, an attention mechanism is proposed. This mechanism allows for non-uniform feature contribution, amplifying the critical role of cosine and triplet loss in the DML framework. Additionally, it improves interpretability, class discrimination, and decreases intra-class variance. As a result, the proposed model surpasses the baseline by 4.4% in the FLAME2 dataset and 7% in the FLAME3 dataset for unobscured flame detection accuracy. Moreover, it demonstrates enhanced class separation in obscured scenarios compared to VGG19, ResNet18, and three backbone models tailored for flame detection.
FlameFinder是一种深度 metric learning(DML)框架,旨在准确检测火焰,即使被烟雾遮挡,使用消防无人机在野火监测期间的热图像。传统 RGB 相机在这种条件下表现不佳,但热成像相机可以捕捉到烟雾遮蔽的火焰特征。然而,它们缺乏绝对热参考点,导致对 this http URL 地址的错误检测,FlameFinder利用成对的热-RGB图像进行训练。通过从无烟样本中学习潜在火焰特征,该模型对相对热梯度的偏见减少。在测试中,它通过分析它们的等效热域分布来检测野火中的火焰。这种方法通过使用监督和距离基于聚类的度量指标来提高性能。该框架包括火焰分割方法和 DML 辅助检测框架。这包括利用中心损失(CL)、三元组中心损失(TCL)和三元组余弦中心损失(TCCL)确定分类的最佳聚类代表。然而,中心损失对其他损失的主导地位导致模型错过对其敏感的特征。为解决这个问题,提出了一个关注机制。该机制允许非均匀特征贡献,突显在 DML 框架中余弦和三元组损失的关键作用。此外,它提高了可解释性、分类准确性和降低内部方差。因此,与基线相比,在 FLAME2 数据集上,所提出的模型提高了 4.4% 的检测准确性,而在 FLAME3 数据集上,它提高了 7% 的检测准确性。此外,在烟雾遮蔽的场景中,与 VGG19、ResNet18 和专为检测火焰的三个骨干模型相比,它展示了更高的类分离能力。
https://arxiv.org/abs/2404.06653
Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. Additionally, we modify the regression loss using Gaussian Wasserstein distance (GWD) to obtain high-quality bounding boxes. Deformable convolution and refinement methods are employed in the detection head to enhance the detection of small objects. We perform extensive experiments on two aerial image datasets, including Visdrone2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach.
从无人机图像中检测物体带来了相当大的挑战,因为以下原因:1)无人机图像通常具有非常大的尺寸,通常是数百万或甚至数千万像素,而计算资源有限。2)小物体尺寸导致有效检测信息不足。3)非均匀物体分布导致计算资源浪费。为解决这些问题,我们提出了YOLC(你只看聚类)框架,这是一个基于无锚定物体检测器,基于CenterNet的,有效且高效的框架。为了克服大规模图像和非均匀物体分布带来的挑战,我们引入了局部尺度模块(LSM),它动态地搜索聚类区域以进行精确检测。此外,我们还使用高斯瓦瑟夫距离(GWD)修改回归损失以获得高质量的边界框。在检测头部采用可变形卷积和优化方法来增强对小物体的检测。我们对两个无人机图像数据集(包括Visdrone2019和UAVDT)进行了广泛的实验,以证明我们提出方法的有效性和优越性。
https://arxiv.org/abs/2404.06180
Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms. However, achieving this goal still remains challenging, primarily due to: (i) For crowded scenes with occluded objects, the high overlap of object bounding boxes leads to confusion among closely located objects. Nevertheless, humans naturally perceive the depth of elements in a scene when observing 2D videos. Inspired by this, even though the bounding boxes of objects are close on the camera plane, we can differentiate them in the depth dimension, thereby establishing a 3D perception of the objects. (ii) For videos with rapidly irregular camera motion, abrupt changes in object positions can result in ID switches. However, if the camera pose are known, we can compensate for the errors in linear motion models. In this paper, we propose \textit{DepthMOT}, which achieves: (i) detecting and estimating scene depth map \textit{end-to-end}, (ii) compensating the irregular camera motion by camera pose estimation. Extensive experiments demonstrate the superior performance of DepthMOT in VisDrone-MOT and UAVDT datasets. The code will be available at \url{this https URL}.
准确地区分每个物体是多目标跟踪(MOT)算法的一个基本目标。然而,要实现这个目标仍然具有挑战性,主要原因如下:(i)在拥挤的场景中,物体边界框的高重叠会导致近距离物体之间的混淆。然而,当观察2D视频时,人类会自然地感知场景中元素的深度。受到这个启发,尽管在相机平面上,物体的边界框很接近,我们仍然可以在深度维度上区分它们,从而建立对物体的3D感知。(ii)对于快速不规则的相机运动视频,物体位置的突然变化可能导致ID切换。然而,如果已知相机姿态,我们可以通过估计线性运动模型的误差来补偿。在本文中,我们提出了深度MOT(DepthMOT),它实现了:(i)检测和估计场景深度图(end-to-end),(ii)通过相机姿态估计来补偿不规则相机运动。在VisDrone-MOT和UAVDT数据集上进行的大量实验证明,深度MOT在表现优异。代码将在此处公开可用:https://this URL。
https://arxiv.org/abs/2404.05518
The study of non-line-of-sight (NLOS) imaging is growing due to its many potential applications, including rescue operations and pedestrian detection by self-driving cars. However, implementing NLOS imaging on a moving camera remains an open area of research. Existing NLOS imaging methods rely on time-resolved detectors and laser configurations that require precise optical alignment, making it difficult to deploy them in dynamic environments. This work proposes a data-driven approach to NLOS imaging, PathFinder, that can be used with a standard RGB camera mounted on a small, power-constrained mobile robot, such as an aerial drone. Our experimental pipeline is designed to accurately estimate the 2D trajectory of a person who moves in a Manhattan-world environment while remaining hidden from the camera's field-of-view. We introduce a novel approach to process a sequence of dynamic successive frames in a line-of-sight (LOS) video using an attention-based neural network that performs inference in real-time. The method also includes a preprocessing selection metric that analyzes images from a moving camera which contain multiple vertical planar surfaces, such as walls and building facades, and extracts planes that return maximum NLOS information. We validate the approach on in-the-wild scenes using a drone for video capture, thus demonstrating low-cost NLOS imaging in dynamic capture environments.
非直线光学(NLOS)成像的研究越来越多,因为其许多潜在应用,包括救援行动和自动驾驶汽车中的行人检测。然而,在运动相机上实现NLOS成像仍然是一个研究热点。现有的NLOS成像方法依赖于时间分辨率检测器和激光配置,需要精确的光学对齐,这使得它们难以在动态环境中部署。本文提出了一种数据驱动的NLOS成像方法,PathFinder,可用于安装在小型、受功率限制的移动机器人上的标准RGB相机,如无人机。我们的实验流程旨在准确估计在曼哈顿环境中的移动人员的2D轨迹,同时保持从相机视场范围外隐藏。我们引入了一种基于注意力的神经网络来处理序列动态连续帧的LOS视频的方法。该方法还包括一个预处理选择度量,用于分析运动相机中包含多个垂直平面表面(如墙和建筑立面)的图像,并提取返回最大NLOS信息的平面。我们在野外场景中使用无人机进行视频捕捉,从而验证了该方法,证明了在动态捕捉环境中低成本的NLOS成像。
https://arxiv.org/abs/2404.05024
Robots are being designed to help people in an increasing variety of settings--but seemingly little attention has been given so far to the specific needs of women, who represent roughly half of the world's population but are highly underrepresented in robotics. Here we used a speculative prototyping approach to explore this expansive design space: First, we identified some potential challenges of interest, including crimes and illnesses that disproportionately affect women, as well as potential opportunities for designers, which were visualized in five sketches. Then, one of the sketched scenarios was further explored by developing a prototype, of a robotic helper drone equipped with computer vision to detect hidden cameras that could be used to spy on women. While object detection introduced some errors, hidden cameras were identified with a reasonable accuracy of 80\% (Intersection over Union (IoU) score: 0.40). Our aim is that the identified challenges and opportunities could help spark discussion and inspire designers, toward realizing a safer, more inclusive future through responsible use of technology.
机器人在各种场景中帮助人的设计越来越普遍,但似乎迄今为止对女性具体需求的研究还很少。在这里,我们使用了一种speculative prototyping方法来探索这个广阔的设计空间:首先,我们识别出一些感兴趣的潜在挑战,包括对女性影响最大的犯罪和疾病,以及设计师可以关注到的潜在机会,这些机会在五个草图中被呈现出来。然后,针对其中一个草图场景,通过开发一个配备了计算机视觉的机器人助手无人机原型,进一步研究了如何利用摄像机进行窥探的问题。虽然物体检测引入了一些误差,但隐蔽摄像头的识别准确率相当高(交集 over 联合(IoU)得分:0.40)。我们的目标是,识别出的挑战和机会可以激发讨论,激发设计师们,从而通过负责任地使用科技,实现一个更安全、更包容的未来。
https://arxiv.org/abs/2404.04123
As robotic systems such as autonomous cars and delivery drones assume greater roles and responsibilities within society, the likelihood and impact of catastrophic software failure within those systems is this http URL aid researchers in the development of new methods to measure and assure the safety and quality of robotics software, we systematically curated a dataset of 221 bugs across 7 popular and diverse software systems implemented via the Robot Operating System (ROS). We produce historically accurate recreations of each of the 221 defective software versions in the form of Docker images, and use a grounded theory approach to examine and categorize their corresponding faults, failures, and fixes. Finally, we reflect on the implications of our findings and outline future research directions for the community.
随着机器人系统(如自动驾驶汽车和无人机送货)在社会中扮演越来越重要的角色,这些系统中灾难性软件故障的可能性和影响就显得尤为关键。为了帮助研究人员开发新的方法来测量和确保机器人软件的安全性和质量,我们系统地整理了一份包含7个流行且具有不同功能的机器人操作系统(ROS)实现中的221个 bug 的数据集。我们以历史准确的方式重新创建了每个221个有缺陷的软件版本,并使用 grounded theory 方法来研究和归类它们相应的故障、失败和修复。最后,我们反思了我们的发现,并为社区未来的研究方向勾画了轮廓。
https://arxiv.org/abs/2404.03629
Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.
传统的机器学习模型通常需要强大的硬件,这使得它们不适合在资源受限的设备上部署。Tiny Machine Learning (tinyML) 作为一种有前景的方法,为在资源受限的设备上运行机器学习模型提供了良好的途径。然而,将多个数据模态集成到 tinyML 模型中仍然具有挑战性,因为增加了复杂性、延迟和功耗。本文提出了一种名为 TinyVQA 的新颖的多模态深度神经网络,用于在资源受限的 tinyML 硬件上部署视觉问答任务。TinyVQA 利用监督注意力为基础的模型学习如何使用视觉和语言模态回答问题。从监督注意力为基础的 VQA 模型获得的蒸馏知识训练了内存感知紧凑型 TinyVQA 模型,并采用低位宽量化技术进一步压缩了模型,以适应部署在 tinyML 设备上。TinyVQA 模型在 FloodNet 数据集上进行了评估,该数据集用于灾害损失评估。紧凑型模型实现了 79.5% 的准确率,证明了 TinyVQA 在现实应用中的有效性。此外,该模型还部署在配备 AI 阵列和 GAP8 微处理器的疯狂飞行器 2.0 上。TinyVQA 模型在部署在 tiny无人机上时,具有低延迟(56ms)和低功耗(693mW),展示了其在资源受限嵌入式系统中的适用性。
https://arxiv.org/abs/2404.03574
Autonomous nano-drones (~10 cm in diameter), thanks to their ultra-low power TinyML-based brains, are capable of coping with real-world environments. However, due to their simplified sensors and compute units, they are still far from the sense-and-act capabilities shown in their bigger counterparts. This system paper presents a novel deep learning-based pipeline that fuses multi-sensorial input (i.e., low-resolution images and 8x8 depth map) with the robot's state information to tackle a human pose estimation task. Thanks to our design, the proposed system -- trained in simulation and tested on a real-world dataset -- improves a state-unaware State-of-the-Art baseline by increasing the R^2 regression metric up to 0.10 on the distance's prediction.
翻译: 自主纳米无人机(直径约10厘米),由于其基于极低功耗的TinyML大脑,能够应对真实世界环境。然而,由于其简单的传感器和计算单元,它们与较大 counterparts 相比还有很长的路要走。本文系统论文提出了一种新颖的深度学习-based 管道,将多感官输入(即低分辨率图像和8x8 深度图)与机器状态信息相结合,以解决人体姿态估计任务。感谢我们的设计,与在模拟环境中训练并在真实世界数据集上测试相比,所提出的系统 - 通过增加距离预测的 R^2 回归指标 - 提高了不知状态下的最先进基线的性能。
https://arxiv.org/abs/2404.02567
In this article, we explore the potential of zero-shot Large Multimodal Models (LMMs) in the domain of drone perception. We focus on person detection and action recognition tasks and evaluate two prominent LMMs, namely YOLO-World and GPT-4V(ision) using a publicly available dataset captured from aerial views. Traditional deep learning approaches rely heavily on large and high-quality training datasets. However, in certain robotic settings, acquiring such datasets can be resource-intensive or impractical within a reasonable timeframe. The flexibility of prompt-based Large Multimodal Models (LMMs) and their exceptional generalization capabilities have the potential to revolutionize robotics applications in these scenarios. Our findings suggest that YOLO-World demonstrates good detection performance. GPT-4V struggles with accurately classifying action classes but delivers promising results in filtering out unwanted region proposals and in providing a general description of the scenery. This research represents an initial step in leveraging LMMs for drone perception and establishes a foundation for future investigations in this area.
在本文中,我们探讨了在无人机感知领域中零击大型多模态模型的潜力。我们重点关注人物检测和动作识别任务,并使用从高空拍摄公开可用数据集来评估两个著名的LMM,即YOLO-World和GPT-4V(视觉)。传统的深度学习方法在很大程度上依赖于大型和高质量的训练数据集。然而,在某些机器人设置中,获取这类数据集可能具有资源密集性或不可行性。提示式大型多模态模型的灵活性和其卓越的泛化能力具有在这些场景中彻底改变机器人应用的可能性。我们的研究结果表明,YOLO-World在检测方面表现良好。GPT-4V在准确分类动作类别的方面表现不佳,但在过滤出不需要的区域建议和提供景观的一般描述方面带来了有前途的结果。这项研究代表了利用LMM进行无人机感知的第一步,为这个领域的未来研究奠定了基础。
https://arxiv.org/abs/2404.01571
Drones may be more advantageous than fixed cameras for quality control applications in industrial facilities, since they can be redeployed dynamically and adjusted to production planning. The practical scenario that has motivated this paper, image acquisition with drones in a car manufacturing plant, requires drone positioning accuracy in the order of 5 cm. During repetitive manufacturing processes, it is assumed that quality control imaging drones will follow highly deterministic periodic paths, stop at predefined points to take images and send them to image recognition servers. Therefore, by relying on prior knowledge about production chain schedules, it is possible to optimize the positioning technologies for the drones to stay at all times within the boundaries of their flight plans, which will be composed of stopping points and the paths in between. This involves mitigating issues such as temporary blocking of line-of-sight between the drone and any existing radio beacons; sensor data noise; and the loss of visual references. We present a self-corrective solution for this purpose. It corrects visual odometer readings based on filtered and clustered Ultra-Wide Band (UWB) data, as an alternative to direct Kalman fusion. The approach combines the advantages of these technologies when at least one of them works properly at any measurement spot. It has three method components: independent Kalman filtering, data association by means of stream clustering and mutual correction of sensor readings based on the generation of cumulative correction vectors. The approach is inspired by the observation that UWB positioning works reasonably well at static spots whereas visual odometer measurements reflect straight displacements correctly but can underestimate their length. Our experimental results demonstrate the advantages of the approach in the application scenario over Kalman fusion.
无人机在工业设施的质量控制应用中可能比固定相机更具优势,因为它们可以动态重新部署并根据生产计划进行调整。本文所描述的实际情景,即在汽车制造厂使用无人机进行图像采集,要求无人机定位精度达到5厘米。在重复生产过程中,假设质量控制成像无人机将遵循高度确定性的周期路径,在预定义的点停站拍照并将其发送到图像识别服务器。因此,通过依赖生产链时间表的先前知识,可以优化无人机的位置技术,使其始终在飞行计划的边界内,包括停站点和飞行路径之间。这包括减轻诸如无人机与现有无线信标之间视线阻塞的问题,传感器数据噪声以及视觉参考丢失等问题。我们提出了这种目的的自校正解决方案。它基于经过滤波和聚类的超宽带(UWB)数据进行视觉里程计读数的修正,作为直接Kalman融合的替代方案。该方法结合了这些技术的优点,只要至少有一个测量点上它们都能正常工作。它包括三个方法组件:独立Kalman滤波、通过流聚类进行数据关联以及根据累积校正向量基于传感器读数的相互校正。该方法源于观察到UWB定位在静态点上表现得相当好,而视觉里程计测量则准确反映直线位移,但可能低估其长度。我们的实验结果表明,在应用场景中,该方法相对于Kalman融合具有优势。
https://arxiv.org/abs/2404.00426
This project aims to revolutionize drone flight control by implementing a nonlinear Deep Reinforcement Learning (DRL) agent as a replacement for traditional linear Proportional Integral Derivative (PID) controllers. The primary objective is to seamlessly transition drones between manual and autonomous modes, enhancing responsiveness and stability. We utilize the Proximal Policy Optimization (PPO) reinforcement learning strategy within the Gazebo simulator to train the DRL agent. Adding a $20,000 indoor Vicon tracking system offers <1mm positioning accuracy, which significantly improves autonomous flight precision. To navigate the drone in the shortest collision-free trajectory, we also build a 3 dimensional A* path planner and implement it into the real flight successfully.
本项目旨在通过实现一个非线性深度强化学习(DRL)代理来颠覆无人机飞行控制,用来说明传统的线性比例微分(PID)控制器。主要目标是使无人机无缝地在手动和自动驾驶模式之间转换,提高反应性和稳定性。我们在Gazebo仿真器中利用Proximal Policy Optimization(PPO)强化学习策略来训练DRL代理。增加一个20,000美元的室内Vicon跟踪系统提供了<1mm的定位精度,这显著提高了自主飞行的精确度。为了在最近的碰撞避免轨迹中引导无人机,我们还构建了一个3维A*路径规划器,并成功地将其融入到实际飞行中。
https://arxiv.org/abs/2404.00204
The goal of field boundary delineation is to predict the polygonal boundaries and interiors of individual crop fields in overhead remotely sensed images (e.g., from satellites or drones). Automatic delineation of field boundaries is a necessary task for many real-world use cases in agriculture, such as estimating cultivated area in a region or predicting end-of-season yield in a field. Field boundary delineation can be framed as an instance segmentation problem, but presents unique research challenges compared to traditional computer vision datasets used for instance segmentation. The practical applicability of previous work is also limited by the assumption that a sufficiently-large labeled dataset is available where field boundary delineation models will be applied, which is not the reality for most regions (especially under-resourced regions such as Sub-Saharan Africa). We present an approach for segmentation of crop field boundaries in satellite images in regions lacking labeled data that uses multi-region transfer learning to adapt model weights for the target region. We show that our approach outperforms existing methods and that multi-region transfer learning substantially boosts performance for multiple model architectures. Our implementation and datasets are publicly available to enable use of the approach by end-users and serve as a benchmark for future work.
野外边界勾勒的目标是预测覆盖遥感图像(如卫星或无人机)中单个农田的多边形边界和内部。自动划分田野边界对于许多农业现实应用场景(如估算地区耕地面积或预测田地收获量)是必要的。将野外边界勾勒视为实例分割问题,但与用于实例分割的传统计算机视觉数据集相比,它呈现出了独特的研究挑战。以前工作的实用性也受到假设足够大的有标签数据集存在的限制,该数据集将用于应用田野边界分割模型,这在大多数地区并不现实(尤其是在资源相对匮乏的地区,如撒哈拉以南非洲地区)。我们提出了一个在缺乏有标签数据集的地区分割卫星图像中作物农田边界的分割方法,利用多区域迁移学习来适应目标区域。我们证明了我们的方法超越了现有方法,多区域迁移学习在多个模型架构上显著提高了性能。我们的实现和数据集都是公开的,以便用户使用,并作为未来工作的基准。
https://arxiv.org/abs/2404.00179
Legal autonomy - the lawful activity of artificial intelligence agents - can be achieved in one of two ways. It can be achieved either by imposing constraints on AI actors such as developers, deployers and users, and on AI resources such as data, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment. The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices (e.g., encoding rules about limitations on zones of operations into the agent software of an autonomous drone device). This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable, and that would enable AI agents to reason about the law. In this paper, we sketch a proof of principle for such a method using large language models (LLMs), expert legal systems known as legal decision paths, and Bayesian networks. We then show how the proposed method could be applied to extant regulation in matters of autonomous cars, such as the California Vehicle Code.
法律自主权 - 人工智能代理的合法活动 - 可以通过两种方式实现。这可以通过对人工智能代理开发者、部署者和用户施加限制,以及对人工智能资源如数据施加限制来实现。后一种方法涉及将关于人工智能驱动设备的现有规则编码到控制这些设备的AI代理软件中(例如,将操作区域限制规则编码到自主无人机设备的代理软件中)。这种方法具有挑战性,因为实现这种方法需要一种提取、加载、转换和计算法律信息的可解释且具有法律可互操作性的方法,这将使AI代理能够推理法律。在本文中,我们用大规模语言模型(LLMs)、著名的法律决策路径(专家法律系统)和贝叶斯网络来阐述这种方法的一个原则。然后,我们展示了如何将该方法应用于自动驾驶汽车领域的现有法规,例如加州车辆代码。
https://arxiv.org/abs/2403.18537