Mobile monocular 3D object detection (Mono3D) (e.g., on a vehicle, a drone, or a robot) is an important yet challenging task. Existing transformer-based offline Mono3D models adopt grid-based vision tokens, which is suboptimal when using coarse tokens due to the limited available computational power. In this paper, we propose an online Mono3D framework, called MonoATT, which leverages a novel vision transformer with heterogeneous tokens of varying shapes and sizes to facilitate mobile Mono3D. The core idea of MonoATT is to adaptively assign finer tokens to areas of more significance before utilizing a transformer to enhance Mono3D. To this end, we first use prior knowledge to design a scoring network for selecting the most important areas of the image, and then propose a token clustering and merging network with an attention mechanism to gradually merge tokens around the selected areas in multiple stages. Finally, a pixel-level feature map is reconstructed from heterogeneous tokens before employing a SOTA Mono3D detector as the underlying detection core. Experiment results on the real-world KITTI dataset demonstrate that MonoATT can effectively improve the Mono3D accuracy for both near and far objects and guarantee low latency. MonoATT yields the best performance compared with the state-of-the-art methods by a large margin and is ranked number one on the KITTI 3D benchmark.
移动设备的三维物体检测(Mono3D) (例如在车辆、无人机或机器人中)是一项重要但具有挑战性的任务。现有的基于Transformer的离线Mono3D模型采用网格视觉代币,虽然使用细代币可以提高性能,但在使用粗代币时性能有所下降。在本文中,我们提出了一个在线Mono3D框架,称为MonoATT,它利用一种具有不同形状和大小的异质代币的新视觉Transformer来实现移动设备的Mono3D。MonoATT的核心思想是,在利用Transformer增强Mono3D之前,自适应地将更细的代币分配给更有意义的区域。为此,我们首先利用先前知识设计了一个评分网络,用于选择图像中最重要的区域,然后提出了一种具有注意力机制的代币簇集和融合网络,以逐步将代币集中在选定区域周围。最后,从不同代币中恢复像素级特征映射,并在使用SOTA的Mono3D检测器作为底层检测核心之前使用该框架进行物体检测。现实世界KITTI数据集的实验结果显示,MonoATT能够有效提高近远物体的Mono3D精度,并保证低延迟。MonoATT比最先进的方法表现更好,并以KITTI 3D基准排名第一。
https://arxiv.org/abs/2303.13018
Data collected at Hurricane Ian (2022) quantifies the demands that small uncrewed aerial systems (UAS), or drones, place on the network communication infrastructure and identifies gaps in the field. Drones have been increasingly used since Hurricane Katrina (2005) for disaster response, however getting the data from the drone to the appropriate decision makers throughout incident command in a timely fashion has been problematic. These delays have persisted even as countries such as the USA have made significant investments in wireless infrastructure, rapidly deployable nodes, and an increase in commercial satellite solutions. Hurricane Ian serves as a case study of the mismatch between communications needs and capabilities. In the first four days of the response, nine drone teams flew 34 missions under the direction of the State of Florida FL-UAS1, generating 636GB of data. The teams had access to six different wireless communications networks but had to resort to physically transferring data to the nearest intact emergency operations center in order to make the data available to the relevant agencies. The analysis of the mismatch contributes a model of the drone data-to-decision workflow in a disaster and quantifies wireless network communication requirements throughout the workflow in five factors. Four of the factors-availability, bandwidth, burstiness, and spatial distribution-were previously identified from analyses of Hurricanes Harvey (2017) and Michael (2018). This work adds upload rate as a fifth attribute. The analysis is expected to improve drone design and edge computing schemes as well as inform wireless communication research and development.
在2022年的飓风伊万(Ian)收集的数据量化了无人机在网络安全基础设施上的要求,并发现了实地中的缺口。自飓风卡特里娜( Katrina)以来,无人机越来越常用于灾难响应,然而,及时将无人机数据发送给适当的决策制定者在整个事件指挥过程中一直是一个问题。这些延迟即使在像美国等国家对无线基础设施、可以快速部署节点和商业卫星解决方案的大规模投资仍然存在。Ian飓风用作通信需求和能力不匹配的案例分析。在响应的前四天中,九架无人机团队执行了34次任务,由佛罗里达州FL-UAS1号州指导,生成了636GB的数据。团队可以访问六个不同的无线通信网络,但不得不采取物理方式将数据转移到最近的完整紧急行动中心,以便将数据提供给相关机构。该分析有助于构建无人机在灾害中数据-决策工作流程模型,并量化在整个工作流程中的无线网络安全需求。四个因素-可用性、带宽、爆发性和空间分布-从飓风哈瓦那( Harvey)和迈克尔(Michael)的分析中已确定。该工作还增加了上传速率作为第五个属性。该分析预计可以改善无人机设计和边缘计算方案,并通知无线通信研究和发展。
https://arxiv.org/abs/2303.12937
We present a demonstration of service-based trajectory planning for a drone delivery system in a multi-drone skyway network. We conduct several experiments using Crazyflie drones to collect the drone's position data, wind speed and direction, and wind effects on voltage consumption rates. The experiments are run for a varying number of recharging stations, wind speed, and wind direction in a multi-drone skyway network. Demo: this https URL
我们呈现了一项示范,展示了如何在多无人机轨道网络上进行无人机配送系统的服务型轨迹规划。我们使用Crazyflie无人机进行了多项实验,以收集无人机的位置数据、风速度和方向,以及风对电压消耗率的影响。在多无人机轨道网络上,我们 varying了 recharge站点的数量、风速度和方向,进行实验。演示链接:这个 https URL。
https://arxiv.org/abs/2303.11514
Autonomous navigation of drones using computer vision has achieved promising performance. Nano-sized drones based on edge computing platforms are lightweight, flexible, and cheap, thus suitable for exploring narrow spaces. However, due to their extremely limited computing power and storage, vision algorithms designed for high-performance GPU platforms cannot be used for nano drones. To address this issue this paper presents a lightweight CNN depth estimation network deployed on nano drones for obstacle avoidance. Inspired by Knowledge Distillation (KD), a Channel-Aware Distillation Transformer (CADiT) is proposed to facilitate the small network to learn knowledge from a larger network. The proposed method is validated on the KITTI dataset and tested on a nano drone Crazyflie, with an ultra-low power microprocessor GAP8.
使用计算机视觉的无人机自主导航取得了令人瞩目的成果。基于边缘计算平台的纳米无人机具有轻便、灵活和廉价的特点,因此适合探索狭窄的空间。然而,由于它们的计算资源和存储空间极其有限,专为高性能GPU平台设计的 Vision 算法不能用来处理纳米无人机。为了解决这一问题,本文提出了一种轻量级卷积神经网络深度估计网络,用于在纳米无人机上避免障碍物。受到知识蒸馏(KD)启发,我们提出了一种通道Aware蒸馏Transformer(CADiT),以促进小型网络从大型网络中学习知识。该方法在KITTI数据集上进行了验证,并成功地应用于纳米无人机 Crazyflie,该无人机配备了功耗极低的微控制器GAP8。
https://arxiv.org/abs/2303.10386
We demonstrate formation flying for drone swarm services. A set of drones fly in four different swarm formations. A dataset is collected to study the effect of formation flying on energy consumption. We conduct a set of experiments to study the effect of wind on formation flying. We examine the forces the drones exert on each other when flying in a formation. We finally identify and classify the formations that conserve most energy under varying wind conditions. The collected dataset aims at providing researchers data to conduct further research in swarm-based drone service delivery. Demo: this https URL
我们演示了无人机群空中服务中的阵列飞行。一组无人机组成四个不同的无人机群阵列。收集了一份数据集,用于研究阵列飞行对能源消耗的影响。进行了一组实验,研究风对阵列飞行的影响。检查了无人机在飞行中的互相作用力。最终,确定了并在不同风条件下最节省能量的阵列类型。收集的数据集旨在提供研究人员数据,以进行基于无人机群的无人机服务配送的进一步研究。演示:这个 https URL。
https://arxiv.org/abs/2303.09694
We present an image dehazing algorithm with high quality, wide application, and no data training or prior needed. We analyze the defects of the original dehazing model, and propose a new and reliable dehazing reconstruction and dehazing model based on the combination of optical scattering model and computer graphics lighting rendering model. Based on the new haze model and the images obtained by the cameras, we can reconstruct the three-dimensional space, accurately calculate the objects and haze in the space, and use the transparency relationship of haze to perform accurate haze removal. To obtain a 3D simulation dataset we used the Unreal 5 computer graphics rendering engine. In order to obtain real shot data in different scenes, we used fog generators, array cameras, mobile phones, underwater cameras and drones to obtain haze data. We use formula derivation, simulation data set and real shot data set result experimental results to prove the feasibility of the new method. Compared with various other methods, we are far ahead in terms of calculation indicators (4 dB higher quality average scene), color remains more natural, and the algorithm is more robust in different scenarios and best in the subjective perception.
我们提出了一种高质量的图像去雾算法,具有广泛的应用,不需要数据训练或先前的需要。我们对原始去雾模型的缺陷进行分析,并提出了基于光学散射模型和计算机图形照明渲染模型的新可靠去雾重建和去雾模型。基于新的雾模型和从摄像头获取的图像,我们可以重建三维空间,准确地计算空间中的物体和雾,并使用雾的透明度关系进行准确的雾去除。为了获得一个3D模拟数据集,我们使用了虚幻5计算机图形渲染引擎。为了在不同场景获取实际拍摄数据,我们使用了雾生成器、数组摄像头、智能手机、水下摄像头和无人机来获取雾数据。我们使用公式推导、模拟数据集和实际拍摄数据集的结果实验结果来证明新方法的可行性。与各种其他方法相比,我们在计算指标方面远远领先(平均场景质量提高4dB),颜色仍然更加自然,算法在不同场景下更加稳健,并且在主观感知方面表现最佳。
https://arxiv.org/abs/2303.09153
Agriculture plays an important role in the food and economy of Bangladesh. The rapid growth of population over the years also has increased the demand for food production. One of the major reasons behind low crop production is numerous bacteria, virus and fungal plant diseases. Early detection of plant diseases and proper usage of pesticides and fertilizers are vital for preventing the diseases and boost the yield. Most of the farmers use generalized pesticides and fertilizers in the entire fields without specifically knowing the condition of the plants. Thus the production cost oftentimes increases, and, not only that, sometimes this becomes detrimental to the yield. Deep Learning models are found to be very effective to automatically detect plant diseases from images of plants, thereby reducing the need for human specialists. This paper aims at building a lightweight deep learning model for predicting leaf disease in tomato plants. By modifying the region-based convolutional neural network, we design an efficient and effective model that demonstrates satisfactory empirical performance on a benchmark dataset. Our proposed model can easily be deployed in a larger system where drones take images of leaves and these images will be fed into our model to know the health condition.
农业在孟加拉国的食品和经济中扮演着重要的角色。多年来,人口的快速增长也增加了对粮食生产的需求。导致粮食产量低的主要原因之一是许多细菌、病毒和真菌的植物疾病。及早检测植物疾病和适当使用农药和肥料是至关重要的,以预防疾病并提高产量。大多数农民在整个农场上都使用通用的农药和肥料,而并不具体了解植物的健康状况。因此,生产成本往往会增加,而且有时这会对产量产生不利的影响。深度学习模型被发现非常有效地从植物图像中自动检测植物疾病,从而减少了人类专家的需求。本文的目标是构建一个轻量级的深度学习模型,用于预测西红柿植物的叶病。通过修改基于区域的卷积神经网络,我们设计了一个高效且有效的模型,在一个基准数据集上表现出令人满意的 empirical 性能。我们提出的模型可以轻松地部署在一个更大的系统中,该系统中无人机会拍摄叶图像,这些图像将输入我们的模型以了解健康状况。
https://arxiv.org/abs/2303.09063
The adoption process of innovative software-intensive technologies leverages complex trust concerns in different forms and shapes. Perceived safety plays a fundamental role in technology adoption, being especially crucial in the case of those innovative software-driven technologies characterized by a high degree of dynamism and unpredictability, like collaborating autonomous systems. These systems need to synchronize their maneuvers in order to collaboratively engage in reactions to unpredictable incoming hazardous situations. That is however only possible in the presence of mutual trust. In this paper, we propose an approach for machine-to-machine dynamic trust assessment for collaborating autonomous systems that supports trust-building based on the concept of dynamic safety assurance within the collaborative process among the software-intensive autonomous systems. In our approach, we leverage the concept of digital twins which are abstract models fed with real-time data used in the run-time dynamic exchange of information. The information exchange is performed through the execution of specialized models that embed the necessary safety properties. More particularly, we examine the possible role of the Digital Twins in machine-to-machine trust building and present their design in supporting dynamic trust assessment of autonomous drones. Ultimately, we present a proof of concept of direct and indirect trust assessment by employing the Digital Twin in a use case involving two autonomous collaborating drones.
创新的软件密集型技术的采用过程利用了不同形式和形状的复杂的信任问题。 perceived safety 在技术采用中扮演着至关重要的角色,特别是在那些具有高度活力和不可预测性的软件驱动技术,如协作自主系统。这些系统需要同步他们的行动,以合作应对不可预测的 incoming 危险情况。然而,只有在存在相互信任的情况下才能实现这一点。 在本文中,我们提出了一种方法,用于对协作自主系统进行机器间动态信任评估,该方法支持基于动态安全保证概念在软件密集型自主系统之间的合作过程中建立信任。在这种方法中,我们利用数字双胞胎的概念,它们是带有实时数据用于 Run-time 动态信息交换的抽象模型。信息交换是通过执行包含必要安全属性的特殊模型实现的。更为特别地,我们研究了数字双胞胎在机器间信任建立中的作用,并介绍了它们的设计,以支持自主无人机的动态信任评估。最终,我们使用数字双胞胎在一个涉及两个自主协作无人机的使用案例中证明了直接和间接信任评估的概念。
https://arxiv.org/abs/2303.12805
Biological sensing and processing is asynchronous and sparse, leading to low-latency and energy-efficient perception and action. In robotics, neuromorphic hardware for event-based vision and spiking neural networks promises to exhibit similar characteristics. However, robotic implementations have been limited to basic tasks with low-dimensional sensory inputs and motor actions due to the restricted network size in current embedded neuromorphic processors and the difficulties of training spiking neural networks. Here, we present the first fully neuromorphic vision-to-control pipeline for controlling a freely flying drone. Specifically, we train a spiking neural network that accepts high-dimensional raw event-based camera data and outputs low-level control actions for performing autonomous vision-based flight. The vision part of the network, consisting of five layers and 28.8k neurons, maps incoming raw events to ego-motion estimates and is trained with self-supervised learning on real event data. The control part consists of a single decoding layer and is learned with an evolutionary algorithm in a drone simulator. Robotic experiments show a successful sim-to-real transfer of the fully learned neuromorphic pipeline. The drone can accurately follow different ego-motion setpoints, allowing for hovering, landing, and maneuvering sideways$\unicode{x2014}$even while yawing at the same time. The neuromorphic pipeline runs on board on Intel's Loihi neuromorphic processor with an execution frequency of 200 Hz, spending only 27 $\unicode{x00b5}$J per inference. These results illustrate the potential of neuromorphic sensing and processing for enabling smaller, more intelligent robots.
生物感知和处理是异步和稀疏的,导致低延迟和高能源效率的感知和行动。在机器人领域,基于事件的视景神经可塑性硬件和突触连接神经网络的承诺表现出类似的特点。然而,机器人的实施局限于低维度感官输入和运动指令的任务,由于当前嵌入式神经可塑性处理器的网络规模限制和训练突触连接神经网络的困难和挑战,这些限制已经导致机器人只能执行基本任务。在这里,我们介绍了第一个完整的神经可塑性视控管道,用于控制自由飞行无人机。具体来说,我们训练了一个突触连接神经网络,它接受高维度 raw 事件 视角数据并输出低级别的控制行动,以执行自主视觉飞行。网络的视觉部分由五层和28.8k个神经元组成,将输入的原始事件映射到自我运动估计,并通过自监督学习在真实事件数据上训练。控制部分由一个解码层组成,并在无人机模拟器中通过学习进化算法进行训练。机器人实验表明,成功地将完全学习的神经可塑性管道 Sim-to-real 转移。无人机可以准确地跟随不同的自我运动目标,允许hover、着陆和调整侧面$unicode{x2014}$即使同时yawing。神经可塑性管道在Intel的Loihi神经可塑性处理器上运行,执行频率为200 Hz,每次推理只需要花费27 $unicode{x00b5}$J。这些结果展示了神经可塑性感知和处理的潜力,以使小型、更智能的机器人成为现实。
https://arxiv.org/abs/2303.08778
Detecting objects in aerial images is challenging because they are typically composed of crowded small objects distributed non-uniformly over high-resolution images. Density cropping is a widely used method to improve this small object detection where the crowded small object regions are extracted and processed in high resolution. However, this is typically accomplished by adding other learnable components, thus complicating the training and inference over a standard detection process. In this paper, we propose an efficient Cascaded Zoom-in (CZ) detector that re-purposes the detector itself for density-guided training and inference. During training, density crops are located, labeled as a new class, and employed to augment the training dataset. During inference, the density crops are first detected along with the base class objects, and then input for a second stage of inference. This approach is easily integrated into any detector, and creates no significant change in the standard detection process, like the uniform cropping approach popular in aerial image detection. Experimental results on the aerial images of the challenging VisDrone and DOTA datasets verify the benefits of the proposed approach. The proposed CZ detector also provides state-of-the-art results over uniform cropping and other density cropping methods on the VisDrone dataset, increasing the detection mAP of small objects by more than 3 points.
检测空中图像中的物体是一项挑战性的任务,因为它们通常由拥挤小型物体在高分辨率图像中的非均匀分布组成。密度裁剪是一种广泛使用的方法,用于改善拥挤小型物体检测,其中将拥挤的小型物体区域提取并处理在高分辨率图像中。然而,通常通过添加其他可学习的成分来实现,因此增加了标准的检测学习和推断过程的复杂性。在本文中,我们提出了一种高效的Cascaded Zoom-in(CZ)探测器,将探测器本身用于密度引导的训练和推断。在训练期间,密度裁剪被找到并标记为一个新类,用于增加训练数据集。在推断期间,密度裁剪首先与基类对象一起检测到,然后输入到第二个推断阶段。这种方法可以轻松地与任何探测器集成,并在标准检测过程中不会造成任何显著变化,就像空中图像检测中流行的均匀裁剪方法一样。对挑战性的VisDrone和DOTA数据集的空中图像的实验结果验证了我们提出的方法的好处。我们提出的CZ探测器在VisDrone数据集中提供了与均匀裁剪和其他密度裁剪方法相比最先进的结果,增加了小型物体的检测mAP超过3点。
https://arxiv.org/abs/2303.08747
Optical identification is often done with spatial or temporal visual pattern recognition and localization. Temporal pattern recognition, depending on the technology, involves a trade-off between communication frequency, range and accurate tracking. We propose a solution with light-emitting beacons that improves this trade-off by exploiting fast event-based cameras and, for tracking, sparse neuromorphic optical flow computed with spiking neurons. In an asset monitoring use case, we demonstrate that the system, embedded in a simulated drone, is robust to relative movements and enables simultaneous communication with, and tracking of, multiple moving beacons. Finally, in a hardware lab prototype, we achieve state-of-the-art optical camera communication frequencies in the kHz magnitude.
光学识别通常与空间或时间的视觉模式识别和定位相结合。根据技术,时间模式识别涉及通信频率、范围和准确的跟踪之间的权衡。我们提出了一种解决方案,使用发光 beacons,通过利用快速事件基于摄像头改善权衡,并使用基于突触神经元的计算稀疏神经形态学光学流进行跟踪。在一个资产监测使用场景中,我们演示了该系统在模拟无人机中的稳定性,对相对运动进行鲁棒性,并同时与多个移动 beacon 进行通信和跟踪。最后,在硬件实验室原型中,我们实现了高分辨率光学相机通信频率在 kHz 量级上。
https://arxiv.org/abs/2303.07169
This work focuses on the persistent monitoring problem, where a set of targets moving based on an unknown model must be monitored by an autonomous mobile robot with a limited sensing range. To keep each target's position estimate as accurate as possible, the robot needs to adaptively plan its path to (re-)visit all the targets and update its belief from measurements collected along the way. In doing so, the main challenge is to strike a balance between exploitation, i.e., re-visiting previously-located targets, and exploration, i.e., finding new targets or re-acquiring lost ones. Encouraged by recent advances in deep reinforcement learning, we introduce an attention-based neural solution to the persistent monitoring problem, where the agent can learn the inter-dependencies between targets, i.e., their spatial and temporal correlations, conditioned on past measurements. This endows the agent with the ability to determine which target, time, and location to attend to across multiple scales, which we show also helps relax the usual limitations of a finite target set. We experimentally demonstrate that our method outperforms other baselines in terms of number of targets visits and average estimation error in complex environments. Finally, we implement and validate our model in a drone-based simulation experiment to monitor mobile ground targets in a high-fidelity simulator.
这项工作重点是持久的监测问题,该问题涉及一组基于未知模型移动的目标,必须由一只具有有限感知范围的自主移动机器人进行监测。为了尽可能准确地保持每个目标的位置估计,机器人需要自适应地规划其路径,(再次)访问所有目标并更新其信念从沿途收集的测量数据。在这个过程中,主要挑战是在利用和探索之间的平衡之间取得平衡,即重新访问以前位置的目标,或寻找新的目标或重新获取丢失的目标。受到最近深度学习进展的鼓舞,我们介绍了一种基于注意力的神经网络解决方案来解决持久的监测问题,该方案使Agent能够学习目标之间的相互依赖性,即它们的空间和时间 correlation conditioning on 过去测量数据。这赋予Agent能力,确定在不同尺度上 attend 到哪些目标、时间和位置,我们表明这也有助于放松有限目标集合的常见限制。我们实验表明,我们的方法在复杂环境中的访问目标数量和平均估计误差方面优于其他基准方法。最后,我们使用无人机模拟实验实现了并验证了我们的模型,以监测在高保真模拟中移动的地面目标。
https://arxiv.org/abs/2303.06350
Power line detection is a critical inspection task for electricity companies and is also useful in avoiding drone obstacles. Accurately separating power lines from the surrounding area in the aerial image is still challenging due to the intricate background and low pixel ratio. In order to properly capture the guidance of the spatial edge detail prior and line features, we offer PL-UNeXt, a power line segmentation model with a booster training strategy. We design edge detail heads computing the loss in edge space to guide the lower-level detail learning and line feature heads generating auxiliary segmentation masks to supervise higher-level line feature learning. Benefited from this design, our model can reach 70.6 F1 score (+1.9%) on TTPLA and 68.41 mIoU (+5.2%) on VITL (without utilizing IR images), while preserving a real-time performance due to few inference parameters.
电源线路检测是电力公司的一项关键检查任务,同时也可用于避免无人机障碍。由于复杂的背景和像素比例较低,在平面图像中准确分离电源线路与周围区域仍然是一项挑战。为了正确捕捉空间边缘细节先验和线特征的指导,我们提供了PL-UNeXt,它是一个电源线路分割模型,采用增强训练策略。我们设计的边缘细节头计算边缘空间的损失,以指导较低级别细节学习,以及线特征头生成auxiliary segmentation masks,以监督更高级别的线特征学习。受益于此设计,我们的模型可以在TTPLA上获得70.6 F1得分(+1.9%),在VITL上获得68.41 mIoU(不使用红外图像) (+5.2%),同时由于 few inference parameters,仍保留实时性能。
https://arxiv.org/abs/2303.04413
We present a novel optimization algorithm called DroNeRF for the autonomous positioning of monocular camera drones around an object for real-time 3D reconstruction using only a few images. Neural Radiance Fields or NeRF, is a novel view synthesis technique used to generate new views of an object or scene from a set of input images. Using drones in conjunction with NeRF provides a unique and dynamic way to generate novel views of a scene, especially with limited scene capabilities of restricted movements. Our approach focuses on calculating optimized pose for individual drones while solely depending on the object geometry without using any external localization system. The unique camera positioning during the data-capturing phase significantly impacts the quality of the 3D model. To evaluate the quality of our generated novel views, we compute different perceptual metrics like the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure(SSIM). Our work demonstrates the benefit of using an optimal placement of various drones with limited mobility to generate perceptually better results.
我们提出了一种名为 DroNeRF 的新优化算法,用于在对象周围自主定位单目相机无人机,并使用仅几个图像进行实时三维重建。NeRF 是一种新的视角合成技术,用于从一组输入图像中生成新的视角视图,并将其与无人机结合使用提供一种独特且动态的方式来生成场景的新视角,特别是当场景能力受到限制且只能进行有限运动时。我们的方法重点是计算每个无人机的最佳姿态,而仅依赖于物体几何结构,而无需使用任何外部定位系统。在数据捕捉阶段,独特的相机位置对三维模型的质量产生了显著的影响。为了评估我们生成新视角的质量,我们计算不同的感知度量,例如峰值信号到噪声比 (PSNR) 和结构相似性指数测量 (SSIM)。我们的工作展示了使用具有有限移动能力的多种无人机的最佳位置产生感知上更好的结果的益处。
https://arxiv.org/abs/2303.04322
Inertial sensor has been widely deployed on smartphones, drones, robots and IoT devices. Due to its importance in ubiquitous and robust localization, inertial sensor based positioning is key in many applications, including personal navigation, location based security, and human-device interaction. However, inertial positioning suffers from the so-called error drifts problem, as the measurements of low-cost MEMS inertial sensor are corrupted with various inevitable error sources, leading to unbounded drifts when being integrated doubly in traditional inertial navigation algorithms. Recently, with increasing sensor data and computational power, the fast developments in deep learning have spurred a large amount of research works in introducing deep learning to tackle the problem of inertial positioning. Relevant literature spans from the areas of mobile computing, robotics and machine learning. This article comprehensively reviews relevant works on deep learning based inertial positioning, connects the efforts from different fields, and covers how deep learning can be applied to solve sensor calibration, positioning error drifts reduction and sensor fusion. Finally, we provide insights on the benefits and limitations of existing works, and indicate the future opportunities in this direction.
惯性传感器已经广泛应用于智能手机、无人机、机器人和物联网设备中。由于其在无处不在和鲁棒定位方面的重要角色,惯性传感器在许多应用中是至关重要的,包括个人导航、位置安全性和人机互动。然而,惯性定位存在所谓的误差漂移问题,因为低成本微机电惯性传感器的测量值受到各种不可避免的误差源的影响,导致在传统惯性导航算法中叠加两次后出现无限级的漂移。最近,随着传感器数据和计算能力的不断增加,深度学习的迅速发展促使了大量研究 work 引入深度学习来解决惯性定位问题。相关文献涵盖了移动计算、机器人学和机器学习等领域。本文全面综述了基于深度学习的惯性定位相关的研究 work,并连接了来自不同领域的努力,并探讨了深度学习如何用于解决传感器校准、定位误差漂移s减少和传感器融合等问题。最后,我们提供了现有工作的有益和限制 insights,并展望了这一领域的未来机会。
https://arxiv.org/abs/2303.03757
Quantiles of a natural phenomena can provide scientists with an important understanding of typical, extreme, or other spreads of concentrations. When a group has several available robots, or teams of scientists come together to study a particular environment, it may be advantageous to pool robot resources in a collaborative way to improve performance. A multirobot team can be difficult to practically bring together and coordinate, especially when robot communication is involved. To this end, we present a study across several axes of the impact of using multiple robots to estimate quantiles of a distribution of interest using an informative path planning formulation. We measure quantile estimation accuracy with increasing team size to understand what benefits result from a multirobot approach in a drone exploration task of analyzing the algae concentration in lakes. We additionally perform an analysis on several parameters, including the spread of robot initial positions, the planning budget, and inter-robot communication, and find that while using more robots generally results in lower estimation error, this benefit is achieved under certain conditions. We present our findings in the context of real field robotic applications and discuss the implications of the results and interesting directions for future work.
自然事物的Quantile可以提供科学家对典型、极端或其他聚集分布的重要理解。当一个团队有多个可用机器人,或一群科学家聚集研究特定的环境时,采用合作的方式整合机器人资源可以提高性能可能是有益的。多机器人团队在实践中很难实际地组织并协调,特别是在涉及机器人通信的情况下。因此,我们提出了一项跨越多个轴的研究,探讨使用多个机器人使用 informative 路径规划框架估计感兴趣的分布quantiles的影响。我们随着团队规模的增加测量Quantile估计的准确性,以了解在无人机探索任务中,多机器人方法如何带来好处,特别是在涉及机器人通信的情况下。我们还对多个参数进行了分析,包括机器人初始位置的分布、规划预算和机器人之间的通信,并发现,虽然使用更多的机器人通常会导致更低的估计误差,但在特定条件下,可以实现这种好处。我们在实际 field robotic 应用的背景下介绍了我们的研究结果,并讨论了结果及其对未来的工作的有意义的方向。
https://arxiv.org/abs/2303.03539
We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images only. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not scale to large scenes. Our method requires only a color camera and no 3D supervision. It simultaneously learns in a self-supervised fashion to predict a "volume occupancy field" from color images and, from this field, to predict the NBV. Thanks to this approach, our method performs well on new scenes as it is not biased towards any training 3D data. We demonstrate this on a recent dataset made of various 3D scenes and show it performs even better than recent methods requiring a depth sensor, which is not a realistic assumption for outdoor scenes captured with a flying drone.
我们介绍了一种方法,可以同时学习探索新的大型环境和仅从彩色图像中重构它们3D的能力。这与Next Best View Problem(NBV)密切相关,其中必须确定下一步应该移动相机的位置,以改善未知场景的覆盖范围。然而,当前NBV方法的大部分依赖于深度传感器,需要3D监督或无法处理大型场景。我们的方法只需要彩色相机,不需要3D监督。它同时通过自监督的方式学习预测“体积占用空间”从彩色图像中,以及从该空间中预测NBV。得益于这种方法,我们的方法在新场景中表现良好,因为它不倾向于训练3D数据。我们展示了一个由各种3D场景组成的最近数据集,并表明它的表现甚至优于需要深度传感器的最新方法,这对于使用飞行无人机捕获的户外场景来说并不是一种真实的假设。
https://arxiv.org/abs/2303.03315
The use of ground control points (GCPs) for georeferencing is the most common strategy in unmanned aerial vehicle (UAV) photogrammetry, but at the same time their collection represents the most time-consuming and expensive part of UAV campaigns. Recently, deep learning has been rapidly developed in the field of small object detection. In this letter, to automatically extract coordinates information of ground control points (GCPs) by detecting GCP-markers in UAV images, we propose a solution that uses a deep learning-based architecture, YOLOv5-OBB, combined with a confidence threshold filtering algorithm and an optimal ranking algorithm. We applied our proposed method to a dataset collected by DJI Phantom 4 Pro drone and obtained good detection performance with the mean Average Precision (AP) of 0.832 and the highest AP of 0.982 for the cross-type GCP-markers. The proposed method can be a promising tool for future implementation of the end-to-end aerial triangulation process.
使用地面控制点(GCPs)进行大地测量是无人机摄影测量中最常用的策略,但同时也是他们收集 represents the most time-consuming and expensive part of UAV campaigns. 最近,小物体检测领域迅速开发了深度学习技术。在本信中,我们提出了一种解决方案,该方案通过在无人机图像中检测GCP标记来自动提取GCP的坐标信息。我们使用了基于深度学习的架构YOLOv5-OBB,并添加了一个置信阈值滤波算法和最优排名算法。我们应用了我们提出的方法到由DJI Phantom 4 Pro无人机收集的数据集上,并取得了良好的检测性能,对于不同类型的GCP标记,平均AP为0.832,最高AP为0.982。该方法可以成为未来实现端到端空中三角测量过程的一个有前途的工具。
https://arxiv.org/abs/2303.03041
We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion. We also propose a novel frame sampling method that uses joint mutual information to acquire the most informative frame sequence in UAV videos. We have integrated our approach with X3D and evaluated the performance on multiple datasets. In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods on UAV-Human(Li et al., 2021), 7.3% improvement on Drone-Action(Perera et al., 2019), and 7.16% improvement on NEC Drones(Choi et al., 2020). We will release the code at the time of publication
我们提出了一种在无人机视频中进行行动识别的新方法。我们的设计旨在处理无人机运动造成的遮挡和视角变化。我们使用了互信息的概念来计算和对齐与人类行动或运动相关的区域。这使我们的识别模型能够从与运动相关的关键特征中学习。我们还提出了一种新帧采样方法,该方法使用共同互信息来获取无人机视频中最 informative 的帧序列。我们与X3D集成并评估了多个数据集的性能。在实践中,我们在UAV-人类(Li等人,2021)和无人机-行动(Perera等人,2019)准确率方面实现了18.9%的提高,在 NEC 无人机(Choi等人,2020)方面实现了7.16%的提高。我们将在发布时释放代码。
https://arxiv.org/abs/2303.02575
In this paper, the problem of drone-assisted collaborative learning is considered. In this scenario, swarm of intelligent wireless devices train a shared neural network (NN) model with the help of a drone. Using its sensors, each device records samples from its environment to gather a local dataset for training. The training data is severely heterogeneous as various devices have different amount of data and sensor noise level. The intelligent devices iteratively train the NN on their local datasets and exchange the model parameters with the drone for aggregation. For this system, the convergence rate of collaborative learning is derived while considering data heterogeneity, sensor noise levels, and communication errors, then, the drone trajectory that maximizes the final accuracy of the trained NN is obtained. The proposed trajectory optimization approach is aware of both the devices data characteristics (i.e., local dataset size and noise level) and their wireless channel conditions, and significantly improves the convergence rate and final accuracy in comparison with baselines that only consider data characteristics or channel conditions. Compared to state-of-the-art baselines, the proposed approach achieves an average 3.85% and 3.54% improvement in the final accuracy of the trained NN on benchmark datasets for image recognition and semantic segmentation tasks, respectively. Moreover, the proposed framework achieves a significant speedup in training, leading to an average 24% and 87% saving in the drone hovering time, communication overhead, and battery usage, respectively for these tasks.
在本文中,考虑了无人机协助合作学习的问题。在这个场景中,一群智能无线设备通过无人机训练一个共享神经网络模型。利用每个设备传感器记录的环境样本,收集本地数据集进行训练。由于各种设备的数据量和传感器噪声水平不同,智能设备通过迭代训练本地数据集上的神经网络模型,并与无人机交换模型参数以聚合。为该系统,考虑了数据异质性、传感器噪声水平和通信错误等因素,并利用仅考虑数据特征或通信条件的基准线,计算出合作学习的收敛速率。提出了一种路径优化方法, aware of both the device data characteristics (i.e., local dataset size and noise level) and their wireless channel conditions, and significantly improves the convergence rate and final accuracy compared to baselines that only consider data characteristics or channel conditions。与最先进的基准线相比,该提出的方法在图像识别和语义分割基准数据集上的训练神经网络模型的最终准确性平均提高了3.85%和3.54%。此外,该框架在训练方面取得了显著加速,导致在这些任务中无人机 hover time、通信 overhead 和电池使用的平均节省分别为24%和87%。
https://arxiv.org/abs/2303.02266