Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow methods have achieved high accuracy, they often come with heavy computation costs. In this paper, we propose a highly efficient optical flow architecture, called NeuFlow, that addresses both high accuracy and computational cost concerns. The architecture follows a global-to-local scheme. Given the features of the input images extracted at different spatial resolutions, global matching is employed to estimate an initial optical flow on the 1/16 resolution, capturing large displacement, which is then refined on the 1/8 resolution with lightweight CNN layers for better accuracy. We evaluate our approach on Jetson Orin Nano and RTX 2080 to demonstrate efficiency improvements across different computing platforms. We achieve a notable 10x-80x speedup compared to several state-of-the-art methods, while maintaining comparable accuracy. Our approach achieves around 30 FPS on edge computing platforms, which represents a significant breakthrough in deploying complex computer vision tasks such as SLAM on small robots like drones. The full training and evaluation code is available at this https URL.
实时高精度光流估计是各种应用的关键组件,包括机器人定位和地图、目标跟踪和计算机视觉活动识别。虽然最近基于学习的光流方法已经达到高准确度,但它们通常伴随着沉重的计算成本。在本文中,我们提出了一个高效的光流架构,称为NeuFlow,该架构解决了高准确度和计算成本的问题。架构遵循全局到局部方案。根据不同分辨率提取的输入图像的特征,采用全局匹配来估计初始光流在1/16分辨率上,捕获大的位移,然后在1/8分辨率上通过轻量级的CNN层进行微调,以提高准确性。我们在Jetson Orin Nano和RTX 2080上评估我们的方法,以证明不同计算平台上的效率改进。我们实现了与几个最先进方法相当的增长速度,同时保持较高的准确性。我们的方法在边缘计算平台上达到约30 FPS,这标志着在部署类似SLAM等复杂计算机视觉任务的小型机器人方面取得了显著的突破。完整的训练和评估代码可在此处访问:https://url.
https://arxiv.org/abs/2403.10425
Forestry constitutes a key element for a sustainable future, while it is supremely challenging to introduce digital processes to improve efficiency. The main limitation is the difficulty of obtaining accurate maps at high temporal and spatial resolution as a basis for informed forestry decision-making, due to the vast area forests extend over and the sheer number of trees. To address this challenge, we present an autonomous Micro Aerial Vehicle (MAV) system which purely relies on cost-effective and light-weight passive visual and inertial sensors to perform under-canopy autonomous navigation. We leverage visual-inertial simultaneous localization and mapping (VI-SLAM) for accurate MAV state estimates and couple it with a volumetric occupancy submapping system to achieve a scalable mapping framework which can be directly used for path planning. As opposed to a monolithic map, submaps inherently deal with inevitable drift and corrections from VI-SLAM, since they move with pose estimates as they are updated. To ensure the safety of the MAV during navigation, we also propose a novel reference trajectory anchoring scheme that moves and deforms the reference trajectory the MAV is tracking upon state updates from the VI-SLAM system in a consistent way, even upon large changes in state estimates due to loop-closures. We thoroughly validate our system in both real and simulated forest environments with high tree densities in excess of 400 trees per hectare and at speeds up to 3 m/s - while not encountering a single collision or system failure. To the best of our knowledge this is the first system which achieves this level of performance in such unstructured environment using low-cost passive visual sensors and fully on-board computation including VI-SLAM.
林业是实现可持续未来至关重要的一部分,而将其引入数字过程以提高效率则具有极大的挑战性。主要的限制是由于森林面积广阔和树木众多,因此准确地获取高时间和空间分辨率的数据作为 informed 林业决策的依据是非常困难的。为了应对这一挑战,我们提出了一个自治的微型无人机(MAV)系统,该系统仅依赖成本效益高和轻量级的被动视觉和惯性传感器执行林下自主导航。我们利用视觉-惯性同时定位与映射(VI-SLAM)技术对MAV状态估计,并将其与体积占用下映射系统相结合以实现可扩展的映射框架,可以直接用于路径规划。与单体地图不同,子图本质上会处理VI-SLAM系统在更新时必然产生的漂移和修正,因为它们随着姿态估计而移动。为了确保MAV在导航过程中的安全性,我们还提出了一个新颖的参考轨迹锚定方案,在VI-SLAM系统状态更新时以一致的方式移动和变形MAV跟踪的参考轨迹。我们在超过400棵树/公顷的实际森林环境中进行了充分验证,并将其速度提高到3米/秒,尽管在状态估计出现大幅度变化时,没有发生碰撞或系统故障。据我们所知,这是第一个在如此无结构环境中使用低成本被动视觉传感器并实现完全车载计算的系统。
https://arxiv.org/abs/2403.09596
Trajectory prediction is an essential component in autonomous driving, particularly for collision avoidance systems. Considering the inherent uncertainty of the task, numerous studies have utilized generative models to produce multiple plausible future trajectories for each agent. However, most of them suffer from restricted representation ability or unstable training issues. To overcome these limitations, we propose utilizing the diffusion model to generate the distribution of future trajectories. Two cruxes are to be settled to realize such an idea. First, the diversity of intention is intertwined with the uncertain surroundings, making the true distribution hard to parameterize. Second, the diffusion process is time-consuming during the inference phase, rendering it unrealistic to implement in a real-time driving system. We propose an Intention-aware denoising Diffusion Model (IDM), which tackles the above two problems. We decouple the original uncertainty into intention uncertainty and action uncertainty and model them with two dependent diffusion processes. To decrease the inference time, we reduce the variable dimensions in the intention-aware diffusion process and restrict the initial distribution of the action-aware diffusion process, which leads to fewer diffusion steps. To validate our approach, we conduct experiments on the Stanford Drone Dataset (SDD) and ETH/UCY dataset. Our methods achieve state-of-the-art results, with an FDE of 13.83 pixels on the SDD dataset and 0.36 meters on the ETH/UCY dataset. Compared with the original diffusion model, IDM reduces inference time by two-thirds. Interestingly, our experiments further reveal that introducing intention information is beneficial in modeling the diffusion process of fewer steps.
轨迹预测是自动驾驶中一个重要的组成部分,尤其是在避障系统中。为了应对任务固有的不确定性,许多研究使用生成模型生成每个智能体多个可能的未来轨迹。然而,大多数模型都受到有限的表示能力或训练问题的困扰。为了克服这些限制,我们提出使用扩散模型生成未来轨迹的分布。实现这一想法的两个关键要素是要确定。首先,意图的多样性与不确定的环境交织在一起,使得真实分布难以参数化。其次,在推理阶段扩散过程耗时,因此在实时驾驶系统中实现这一想法是不现实的。我们提出了一种意识到的去噪扩散模型(IDM),它解决了上述两个问题。我们将原始的不确定性分解为意图不确定性和动作不确定性,并使用两个依赖的扩散过程建模。为了降低推理时间,我们在意图意识扩散过程中减小变量维度,并限制了动作意识扩散过程的初始分布,导致扩散步骤减少。为了验证我们的方法,我们在斯坦福无人机数据集(SDD)和ETH/UCY数据集上进行了实验。我们的方法实现了与最先进方法相同的结果,在SDD数据集上的FDE为13.83像素,在ETH/UCY数据集上的值为0.36米。与原始扩散模型相比,IDM通过将意图信息引入模型,将推理时间减少了三分之二。有趣的是,我们的实验还进一步表明,在模型中引入意图信息有助于更好地建模扩散过程。
https://arxiv.org/abs/2403.09190
Safe road-crossing by self-driving vehicles is a crucial problem to address in smart-cities. In this paper, we introduce a multi-sensor fusion approach to support road-crossing decisions in a system composed by an autonomous wheelchair and a flying drone featuring a robust sensory system made of diverse and redundant components. To that aim, we designed an analytical danger function based on explainable physical conditions evaluated by single sensors, including those using machine learning and artificial vision. As a proof-of-concept, we provide an experimental evaluation in a laboratory environment, showing the advantages of using multiple sensors, which can improve decision accuracy and effectively support safety assessment. We made the dataset available to the scientific community for further experimentation. The work has been developed in the context of an European project named REXASI-PRO, which aims to develop trustworthy artificial intelligence for social navigation of people with reduced mobility.
自动驾驶车辆安全过马路是一个关键问题,需要解决在智能城市中。在本文中,我们提出了一个多传感器融合方法,以支持由轮椅和一架飞行无人机组成系统的道路过马路决策。为了实现这个目标,我们基于单个传感器评估的物理条件设计了一个分析型危险函数,包括使用机器学习和人工智能的传感器。作为概念证明,我们在实验室环境中提供了一个实验评估,证明了使用多个传感器可以提高决策准确性,有效支持安全评估。我们将数据集公开给科学界,供进一步实验。这项工作是在欧洲项目REXASI-PRO的背景下开发的,该项目的目标是开发可信赖的智能辅助设备导航AI。
https://arxiv.org/abs/2403.08984
Avian-informed drones feature morphing wing and tail surfaces, enhancing agility and adaptability in flight. Despite their large potential, realising their full capabilities remains challenging due to the lack of generalized control strategies accommodating their large degrees of freedom and cross-coupling effects between their control surfaces. Here we propose a new body-rate controller for avian-informed drones that uses all available actuators to control the motion of the drone. The method exhibits robustness against physical perturbations, turbulent airflow, and even loss of certain actuators mid-flight. Furthermore, wing and tail morphing is leveraged to enhance energy efficiency at 8m/s, 10m/s and 12m/s using in-flight Bayesian optimization. The resulting morphing configurations yield significant gains across all three speeds of up to 11.5% compared to non-morphing configurations and display a strong resemblance to avian flight at different speeds. This research lays the groundwork for the development of autonomous avian-informed drones that operate under diverse wind conditions, emphasizing the role of morphing in improving energy efficiency.
鸟类引导的无人机具有可塑的机翼和尾翼表面,在飞行中提高了敏捷性和适应性。尽管它们具有很大的潜力,但实现其全部功能仍然具有挑战性,因为缺乏适用于其大自由度和耦合效果的控制策略。在这里,我们提出了一种新的鸟类引导无人机的新体率控制器,该控制器利用所有可用的舵机控制无人机的运动。该方法在应对物理扰动、扰动流动和甚至中飞行中某些舵机失效时表现出稳健性。此外,通过飞行中贝叶斯优化来提高能量效率,在8m/s、10m/s和12m/s的速度下实现。 resulting morphing configurations yield significant gains across all three speeds up to 11.5% compared to non-morphing configurations and display a strong similarity to avian flight at different speeds. 这项研究为开发在多样风条件下运行的自鸟类引导无人机奠定了基础,突出了可塑在提高能源效率中的作用。
https://arxiv.org/abs/2403.08598
Robots able to run, fly, and grasp have a high potential to solve a wide scope of tasks and navigate in complex environments. Several mechatronic designs of such robots with adaptive morphologies are emerging. However, the task of landing on an uneven surface, traversing rough terrain, and manipulating objects still presents high challenges. This paper introduces the design of a novel rotor UAV MorphoGear with morphogenetic gear and includes a description of the robot's mechanics, electronics, and control architecture, as well as walking behavior and an analysis of experimental results. MorphoGear is able to fly, walk on surfaces with several gaits, and grasp objects with four compatible robotic limbs. Robotic limbs with three degrees of freedom (DoFs) are used by this UAV as pedipulators when walking or flying and as manipulators when performing actions in the environment. We performed a locomotion analysis of the landing gear of the robot. Three types of robot gaits have been developed. The experimental results revealed low crosstrack error of the most accurate gait (mean of 1.9 cm and max of 5.5 cm) and the ability of the drone to move with a 210 mm step length. Another type of robot gait also showed low crosstrack error (mean of 2.3 cm and max of 6.9 cm). The proposed MorphoGear system can potentially achieve a high scope of tasks in environmental surveying, delivery, and high-altitude operations.
具有奔跑、飞行和抓取能力的机器人具有解决广泛任务和复杂环境的高潜力。已经出现了几种具有自适应形态结构的此类机器人的设计。然而,在着陆不平表面、穿越粗糙地形和操作物体方面,任务仍然具有挑战性。本文介绍了一种新型多旋翼UAV MorphoGear的设计,包括描述机器人的机械、电子和控制架构以及行走行为和实验结果的分析。MorphoGear能够飞行、在几步上行走并在环境中抓取物体。在行走或飞行时,UAV使用具有三个自由度的机器人手臂作为足履器。我们对机器人的起落架进行了运动分析。开发了三种机器人步态。实验结果揭示了最精确的步态(平均值为1.9厘米,最大值为5.5厘米)以及无人机以210毫米的步长移动的能力。另一种机器人步态也显示出较低的跨距误差(平均值为2.3厘米,最大值为6.9厘米)。所提出的MorphoGear系统有可能在环境调查、交付和太空作业等广阔领域取得成功。
https://arxiv.org/abs/2403.08340
Acting is an important decisional function for autonomous robots. Acting relies on skills to implement and to model the activities it oversees: refinement, local recovery, temporal dispatching, external asynchronous events, and commands execution, all done online. While sitting between planning and the robotic platform, acting often relies on programming primitives and an interpreter which executes these skills. Following our experience in providing a formal framework to program the functional components of our robots, we propose a new language, to program the acting skills. This language maps unequivocally into a formal model which can then be used to check properties offline or execute the skills, or more precisely their formal equivalent, and perform runtime verification. We illustrate with a real example how we can program a survey mission for a drone in this new language, prove some formal properties on the program and directly execute the formal model on the drone to perform the mission.
表演是一个重要的决策功能,对于自主机器人来说。表演依赖于技能来实现和建模它所监督的活动:细化,局部恢复,时间调度,外部异步事件和命令执行,所有这些都在在线完成。在规划和机器人平台之间,表演通常依赖于编程原语和解释器来执行这些技能。在我们为机器人提供正式框架的经验基础上,我们提出了一个新的语言,用于编程机器人的表演技能。这个语言完全等价于一个形式化模型,可以在线或离线检查属性,或者更精确地说,它们的正式等价物,并执行运行时验证。我们用一个实际例子来说明,如何使用这种新语言编程一个无人机调查任务。然后我们在程序上证明一些形式化的属性,并直接在无人机上执行形式化模型以执行任务。
https://arxiv.org/abs/2403.07770
Drones are also known as UAVs are originally designed for military purposes. With the technological advances, they can be seen in most of the aspects of life from filming to logistics. The increased use of drones made it sometimes essential to form a collaboration between them to perform the task efficiently in a defined process. This paper investigates the use of a combined centralised and decentralised architecture for the collaborative operation of drones in a parts delivery scenario to enable and expedite the operation of the factories of the future. The centralised and decentralised approaches were extensively researched, with experimentation being undertaken to determine the appropriateness of each approach for this use-case. Decentralised control was utilised to remove the need for excessive communication during the operation of the drones, resulting in smoother operations. Initial results suggested that the decentralised approach is more appropriate for this use-case. The individual functionalities necessary for the implementation of a decentralised architecture were proven and assessed, determining that a combination of multiple individual functionalities, namely VSLAM, dynamic collision avoidance and object tracking, would give an appropriate solution for use in an industrial setting. A final architecture for the parts delivery system was proposed for future work, using a combined centralised and decentralised approach to combat the limitations inherent in each architecture.
无人机,也称为UAV,最初是为军事目的而设计的。随着技术的进步,它们现在可以应用于生活的各个领域,从拍摄到物流。无人机在生活和工业中的应用越来越广泛,使它们在执行定义的任务时有时成为必不可少的工具。本文研究了在分派场景中使用综合集中和分散架构协同操作无人机以实现更高效操作的方法,以促进未来工厂的运作。本文对集中和分散方法进行了广泛研究,并通过实验确定了哪种方法最适合这个使用场景。通过采用分散控制,消除了无人机在操作过程中需要进行过度沟通的问题,从而实现了更顺畅的操作。初步结果表明,分散方法更适合这个使用场景。为了实现分散架构,对实施分散架构所需的单个功能进行了深入评估,确定多种单独功能的组合,即VSLAM、动态避障和物体跟踪,将为工业环境提供适当的解决方案。为未来工作,提出了一个集成了中央集中和分散架构的部件交付系统架构。该架构旨在克服每个架构固有的限制。
https://arxiv.org/abs/2403.07635
This paper introduces iRoCo (intuitive Robot Control) - a framework for ubiquitous human-robot collaboration using a single smartwatch and smartphone. By integrating probabilistic differentiable filters, iRoCo optimizes a combination of precise robot control and unrestricted user movement from ubiquitous devices. We demonstrate and evaluate the effectiveness of iRoCo in practical teleoperation and drone piloting applications. Comparative analysis shows no significant difference between task performance with iRoCo and gold-standard control systems in teleoperation tasks. Additionally, iRoCo users complete drone piloting tasks 32\% faster than with a traditional remote control and report less frustration in a subjective load index questionnaire. Our findings strongly suggest that iRoCo is a promising new approach for intuitive robot control through smartwatches and smartphones from anywhere, at any time. The code is available at this http URL
本文介绍了一种名为iRoCo(直观机器人控制)的框架,用于通过单个智能手表和智能手机实现普遍的人机协同。通过集成概率差分滤波器,iRoCo优化了精确机器人控制和无限制的用户运动,从而实现了一种普遍设备中的协同。我们在实际遥控和无人机驾驶员应用中评估了iRoCo的有效性。与标准控制系统和黄金标准控制系统在遥控任务中的表现进行比较分析,结果显示iRoCo在遥控任务中的表现没有显著差异。此外,使用iRoCo的用户在主观负载指数问卷中报告的 frustration 降低了32%。我们的研究结果强烈表明,通过智能手表和智能手机实现直观机器人控制具有前景,可以实现远距离和随时随地的控制。代码可在此处下载:http://www.example.com
https://arxiv.org/abs/2403.07199
This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accuracy with short execution time when their CV solutions run on an embedded device, such as Raspberry PI or Nvidia Jetson Nano. The vision problem for 2023 LPCVC is segmentation of images acquired by Unmanned Aerial Vehicles (UAVs, also called drones) after disasters. The 2023 LPCVC attracted 60 international teams that submitted 676 solutions during the submission window of one month. This article explains the setup of the competition and highlights the winners' methods that improve accuracy and shorten execution time.
这篇文章描述了2023年IEEE Low-Power Computer Vision Challenge(LPCVC)。自2015年以来,LPCVC已成为一个国际性的比赛,致力于解决在边缘设备上处理计算机视觉(CV)挑战。大多数CV研究人员集中精力提高准确度,以牺牲机器模型的日益增长的大小。LPCVC在准确性和资源需求之间取得了平衡。获胜者必须在运行CV解决方案的嵌入式设备上实现高准确度,例如树莓派或Nvidia Jetson Nano。2023 LPCVC的目标是通过对无人机(UAVs,也称为无人机)在灾难后获取的图像进行分割。2023 LPCVC吸引了来自20个国家的60支队伍,在提交窗口内提交了676个解决方案。本文解释了比赛的设置,并重点介绍了获胜者的方法,以提高准确度和缩短执行时间。
https://arxiv.org/abs/2403.07153
Mastering autonomous drone landing on dynamic platforms presents formidable challenges due to unpredictable velocities and external disturbances caused by the wind, ground effect, turbines or propellers of the docking platform. This study introduces an advanced Deep Reinforcement Learning (DRL) agent, this http URL, designed to navigate and land on platforms in the presence of windy conditions, thereby enhancing drone autonomy and safety. this http URL is rigorously trained within the gym-pybullet-drone simulation, an environment that mirrors real-world complexities, including wind turbulence, to ensure the agent's robustness and adaptability. The agent's capabilities were empirically validated with Crazyflie 2.1 drones across various test scenarios, encompassing both simulated environments and real-world conditions. The experimental results showcased this http URL's high-precision landing and its ability to adapt to moving platforms, even under wind-induced disturbances. Furthermore, the system performance was benchmarked against a baseline PID controller augmented with an Extended Kalman Filter, illustrating significant improvements in landing precision and error recovery. this http URL leverages bio-inspired learning to adapt to external forces like birds, enhancing drone adaptability without knowing force magnitudes.This research not only advances drone landing technologies, essential for inspection and emergency applications, but also highlights the potential of DRL in addressing intricate aerodynamic challenges.
掌握在动态平台上自主着陆是一个具有挑战性的任务,因为在这种情况下,风、平台效应、风轮或充电器产生的外部干扰会导致不可预测的速度。为了解决这个问题,本研究介绍了一种先进的Deep Reinforcement Learning(DRL)代理,这个http URL,旨在在有风条件下导航和着陆平台,从而提高无人机自主性和安全性。这个http URL在gym-pybullet-drone仿真环境中进行了严格训练,该环境模拟了现实世界的复杂性,包括风涌、飞行器或充电器产生的外部干扰。使用Crazyflie 2.1无人机进行了各种测试场景的实证验证,涵盖了模拟环境和现实世界条件。实验结果展示了这个http URL的高精度着陆和适应移动平台的能力,即使在风引起干扰的情况下。此外,系统性能与基准PID控制器加分Extended Kalman Filter进行了比较,展示了着陆精度和误差恢复的显著提高。这个http URL利用生物启发式学习来适应外部力量,如鸟类,而无需知道力的大小。这项研究不仅推动了无人机着陆技术的发展,这对于检测和紧急应用至关重要,而且也突出了DRL在解决复杂流体动力学挑战方面的潜力。
https://arxiv.org/abs/2403.06572
We present a fully autonomous self-recharging drone system capable of long-duration sustained operations near powerlines. The drone is equipped with a robust onboard perception and navigation system that enables it to locate powerlines and approach them for landing. A passively actuated gripping mechanism grasps the powerline cable during landing after which a control circuit regulates the magnetic field inside a split-core current transformer to provide sufficient holding force as well as battery recharging. The system is evaluated in an active outdoor three-phase powerline environment. We demonstrate multiple contiguous hours of fully autonomous uninterrupted drone operations composed of several cycles of flying, landing, recharging, and takeoff, validating the capability of extended, essentially unlimited, operational endurance.
我们报道了一种具有完全自主充电功能、能够在电力线附近进行长时间连续操作的全自动无人机系统。无人机配备了强大的车载感知和导航系统,使其能够找到电力线并靠近进行着陆。在着陆后,一个被动式触发抓取机制抓住电力线电缆,然后控制电路调节分差线圈内的磁场,提供足够的固定力和电池充电。系统在一个充满活力的户外三相电力线环境中进行评估。我们展示了由多个飞行、着陆、充电和起飞循环组成的连续数小时完全自主操作,验证了无限扩展的、基本无限的操作寿命能力。
https://arxiv.org/abs/2403.06533
Vision is an important metaphor in ethical and political questions of knowledge. The feminist philosopher Donna Haraway points out the ``perverse'' nature of an intrusive, alienating, all-seeing vision (to which we might cry out ``stop looking at me!''), but also encourages us to embrace the embodied nature of sight and its promises for genuinely situated knowledge. Current technologies of machine vision -- surveillance cameras, drones (for war or recreation), iPhone cameras -- are usually construed as instances of the former rather than the latter, and for good reasons. However, although in no way attempting to diminish the real suffering these technologies have brought about in the world, I make the case for understanding technologies of computer vision as material instances of embodied seeing and situated knowing. Furthermore, borrowing from Iris Murdoch's concept of moral vision, I suggest that these technologies direct our labor towards self-reflection in ethically significant ways. My approach draws upon paradigms in computer vision research, phenomenology, and feminist epistemology. Ultimately, this essay is an argument for directing more philosophical attention from merely criticizing technologies of vision as ethically deficient towards embracing them as complex, methodologically and epistemologically important objects.
视觉是知识伦理和政治问题中的一个重要隐喻。女性哲学家 Donna Haraway 指出,侵入性、疏离、全知视觉的“变态”性质(我们可能会喊道“停止看我!”),但同时也鼓励我们拥抱视觉的 embodied性质及其为真正情境化知识 的承诺。目前的人工智能视觉技术(例如监控摄像头、无人机,用于战争或娱乐目的 iPhone 相机等)通常被视为前者的实例,而不是后者的实例,这是有道理的。 然而,虽然这些技术带来的真实痛苦不能被轻视,但我认为我们应该理解计算机视觉技术作为 embodied seeing 和situated knowing 的物质实例。此外,借鑒 Iris Murdoch 的道德视觉概念,我认为这些技术将我们的劳动指向了在伦理意义上进行自我反思。我的方法借鉴了计算机视觉研究、现象学和女性主义实证主义的范式。 最终,本文的建议是,从仅仅批评视觉技术作为伦理上有缺陷的工具转向拥抱它们作为复杂、方法论上和认识论上重要的物体。
https://arxiv.org/abs/2403.05805
Solar energy is rapidly becoming a robust renewable energy source to conventional finite resources such as fossil fuels. It is harvested using interconnected photovoltaic panels, typically built with crystalline silicon cells, i.e. semiconducting materials that convert effectively the solar radiation into electricity. However, crystalline silicon is fragile and vulnerable to cracking over time or in predictive maintenance tasks, which can lead to electric isolation of parts of the solar cell and even failure, thus affecting the panel performance and reducing electricity generation. This work aims to developing a system for detecting cell cracks in solar panels to anticipate and alaert of a potential failure of the photovoltaic system by using computer vision techniques. Three scenarios are defined where these techniques will bring value. In scenario A, images are taken manually and the system detecting failures in the solar cells is not subject to any computationa constraints. In scenario B, an Edge device is placed near the solar farm, able to make inferences. Finally, in scenario C, a small microcontroller is placed in a drone flying over the solar farm and making inferences about the solar cells' states. Three different architectures are found the most suitable solutions, one for each scenario, namely the InceptionV3 model, an EfficientNetB0 model shrunk into full integer quantization, and a customized CNN architechture built with VGG16 blocks.
太阳能正在迅速成为传统有限资源(如化石燃料)的稳健可再生能源来源。它通过连接的太阳能光伏板进行收获,通常使用晶圆硅细胞构建,即能有效将太阳能转化为电能的半导体材料。然而,晶圆硅脆弱且易受长时间的划痕或预测性维护任务而破裂,可能导致太阳能电池部分停电,甚至影响整体性能,从而降低发电量。本研究旨在开发一个系统,通过使用计算机视觉技术检测太阳能光伏板上的裂纹,以预测和警报潜在的太阳能系统故障,从而提高太阳能电池的性能并减少发电量。在场景A中,通过手动拍摄图像,系统对太阳能电池的检测失败没有计算约束。在场景B中,将边缘设备放置在太阳能农场附近,能够进行推断。最后,在场景C中,将小型微控制器放置在飞越太阳能农场的无人机上,对太阳能电池的状态进行推断。在三个场景中,发现了最合适的解决方案,每个解决方案都针对不同的场景进行设计,分别为InceptionV3模型、EfficientNetB0模型以及使用VGG16块的自定义CNN架构。
https://arxiv.org/abs/2403.05694
Cross-view geo-localization aims to match images of the same target from different platforms, e.g., drone and satellite. It is a challenging task due to the changing both appearance of targets and environmental content from different views. Existing methods mainly focus on digging more comprehensive information through feature maps segmentation, while inevitably destroy the image structure and are sensitive to the shifting and scale of the target in the query. To address the above issues, we introduce a simple yet effective part-based representation learning, called shifting-dense partition learning (SDPL). Specifically, we propose the dense partition strategy (DPS), which divides the image into multiple parts to explore contextual-information while explicitly maintain the global structure. To handle scenarios with non-centered targets, we further propose the shifting-fusion strategy, which generates multiple sets of parts in parallel based on various segmentation centers and then adaptively fuses all features to select the best partitions. Extensive experiments show that our SDPL is robust to position shifting and scale variations, and achieves competitive performance on two prevailing benchmarks, i.e., University-1652 and SUES-200.
跨视角几何定位旨在将不同平台上的相同目标的照片匹配起来,例如无人机和卫星。由于从不同视角目标的外观和环境内容的不断变化,这是一个具有挑战性的任务。现有的方法主要集中通过特征图分割获取更全面的信息,但不可避免地破坏了图像结构,并且对目标在查询中的平移和缩放非常敏感。为了应对上述问题,我们提出了一个简单而有效的基于部分的表示学习,称为平移密集分割学习(SDPL)。具体来说,我们提出了密集分割策略(DPS),它将图像划分为多个部分以探索上下文信息,同时明确保持全局结构。为了处理非中心目标的情况,我们进一步提出了平移融合策略,它根据各种分割中心生成多个部分,然后根据查询位置动态地将所有特征融合选择最佳部分。大量实验证明,我们的SDPL对位置平移和缩放变化具有鲁棒性,并且在两个主导基准测评指标(即大学1652和SUES-200)上实现了 competitive performance。
https://arxiv.org/abs/2403.04172
Sub-\SI{50}{\gram} nano-drones are gaining momentum in both academia and industry. Their most compelling applications rely on onboard deep learning models for perception despite severe hardware constraints (\ie sub-\SI{100}{\milli\watt} processor). When deployed in unknown environments not represented in the training data, these models often underperform due to domain shift. To cope with this fundamental problem, we propose, for the first time, on-device learning aboard nano-drones, where the first part of the in-field mission is dedicated to self-supervised fine-tuning of a pre-trained convolutional neural network (CNN). Leveraging a real-world vision-based regression task, we thoroughly explore performance-cost trade-offs of the fine-tuning phase along three axes: \textit{i}) dataset size (more data increases the regression performance but requires more memory and longer computation); \textit{ii}) methodologies (\eg fine-tuning all model parameters vs. only a subset); and \textit{iii}) self-supervision strategy. Our approach demonstrates an improvement in mean absolute error up to 30\% compared to the pre-trained baseline, requiring only \SI{22}{\second} fine-tuning on an ultra-low-power GWT GAP9 System-on-Chip. Addressing the domain shift problem via on-device learning aboard nano-drones not only marks a novel result for hardware-limited robots but lays the ground for more general advancements for the entire robotics community.
在学术界和工业界,微型无人机(Sub-SI{50}{\gram}纳米无人机)正在快速获得关注。它们最具说服力的应用依赖于在机身深度学习模型进行感知,尽管硬件约束很严重(即小于100毫瓦的处理器)。当部署在不包含在训练数据中的未知环境中时,这些模型通常由于领域转移而表现不佳。为了应对这一基本问题,我们提出了一个前所未有的在机身纳米无人机上进行本地学习的方法,其中第一阶段致力于对预训练卷积神经网络(CNN)进行自监督微调。利用一个基于真实世界视觉的回归任务,我们深入研究了微调阶段性能与成本之间的关系:\textit{i}数据集大小(数据量越大,回归性能越好,但需要更多的内存和更长的时间);\textit{ii}方法论(例如仅微调部分模型参数与仅微调模型参数);和\textit{iii}自监督策略。我们的方法在预训练基线的基础上,平均绝对误差提高了30\%,只需要在超低功耗的GWT GAP9系统芯片上进行22\秒的微调。通过在机身纳米无人机上进行本地学习来解决领域转移问题不仅为受硬件限制的机器人带来了新颖的结果,而且为整个机器人制造业带来了更一般性的进步。
https://arxiv.org/abs/2403.04071
We propose a method for autonomous precision drone landing with fiducial markers and a gimbal-mounted, multi-payload camera with wide-angle, zoom, and IR sensors. The method has minimal data requirements; it depends primarily on the direction from the drone to the landing pad, enabling it to switch dynamically between the camera's different sensors and zoom factors, and minimizing auxiliary sensor requirements. It eliminates the need for data such as altitude above ground level, straight-line distance to the landing pad, fiducial marker size, and 6 DoF marker pose (of which the orientation is problematic). We leverage the zoom and wide-angle cameras, as well as visual April Tag fiducial markers to conduct successful precision landings from much longer distances than in previous work (168m horizontal distance, 102m altitude). We use two types of April Tags in the IR spectrum - active and passive - for precision landing both at daytime and nighttime, instead of simple IR beacons used in most previous work. The active IR landing pad is heated; the novel, passive one is unpowered, at ambient temperature, and depends on its high reflectivity and an IR differential between the ground and the sky. Finally, we propose a high-level control policy to manage initial search for the landing pad and subsequent searches if it is lost - not addressed in previous work. The method demonstrates successful landings with the landing skids at least touching the landing pad, achieving an average error of 0.19m. It also demonstrates successful recovery and landing when the landing pad is temporarily obscured.
我们提出了一个使用准直器标记和安装在陀螺仪上的多负载相机进行自主精准无人机着陆的方法。该方法具有最小数据要求;它主要依赖于无人机与着陆台之间的方向,使它可以动态地在相机的不同传感器和变焦因子之间切换,并最小化辅助传感器的要求。它消除了诸如地面高度、直线距离到着陆台、标定标记尺寸和6个DoF标记姿态(其中姿态方向存在问题)等数据的需求。我们利用了缩放和广角相机,以及视觉 April Tag 准直器来进行精准着陆,从比以往工作更远的距离成功实现精准着陆(168米水平距离,102米海拔)。我们在红外频段中使用两种类型的 April Tags - 主动和被动 - 进行精准着陆,不仅在白天而且在夜间。主动红外着陆台加热;新的被动式红外台在常温下工作,依赖于其高反射率和地面与天空之间的红外差。最后,我们提出了一个高级控制策略,用于管理初始寻找着陆台和如果失去,后续的搜索 - 没有在以前的工作中被解决。该方法通过接触起落架实现成功着陆,平均误差为0.19米。它还展示了在着陆台暂时被遮挡时成功恢复和着陆。
https://arxiv.org/abs/2403.03806
Underwater soft grippers exhibit potential for applications such as monitoring, research, and object retrieval. However, existing underwater gripping techniques frequently cause disturbances to ecosystems. In response to this challenge, we present a novel underwater gripping framework comprising a lightweight gripper affixed to a custom submarine pod deployable via drone. This approach minimizes water disturbance and enables efficient navigation to target areas, enhancing overall mission effectiveness. The pod allows for underwater motion and is characterized by four degrees of freedom. It is provided with a custom buoyancy system, two water pumps for differential thrust and two for pitching. The system allows for buoyancy adjustments up to a depth of 6 meters, as well as motion in the plane. The 3-fingered gripper is manufactured out of silicone and was successfully tested on objects with different shapes and sizes, demonstrating a maximum pulling force of up to 8 N when underwater. The reliability of the submarine pod was tested in a water tank by tracking its attitude and energy consumption during grasping maneuvers. The system also accomplished a successful mission in a lake, where it was deployed on a hexacopter. Overall, the integration of this system expands the operational capabilities of underwater grasping, makes grasping missions more efficient and easy to automate, as well as causing less disturbance to the water ecosystem.
水下柔性抓爪展示了应用于监测、研究和物体检索的潜在功能。然而,现有的水下抓取技术通常会对生态系统造成干扰。为了应对这一挑战,我们提出了一个新型水下抓取框架,它包括一个轻量级的抓爪附在一个可由无人机部署的定制潜艇 pod 上。这种方法最小化了水干扰,并使目标区域的有效导航更加高效。该 pod 允许水下运动,并具有四个自由度。它配备了一个定制的浮力系统、两个差动推力水泵和两个俯仰水泵。系统允许在 6 米的深度范围内进行浮力调整,以及在平面内进行运动。三个手指抓爪由硅胶制造,在不同的形状和尺寸的物体上进行了成功的测试,显示了水下最大拉力可达 8 N。水下 pod 的可靠性在水中实验室通过跟踪抓取动作的态度和能耗进行了测试。系统还成功在湖泊中执行了任务,将部署在一个六旋翼上。总之,将此系统集成到水下抓取中,扩大了水下抓取的操作能力,使抓取任务更加高效和容易自动化,同时减少了对水生生态系统的干扰。
https://arxiv.org/abs/2403.01891
Aerial robots show significant potential for forest canopy research and environmental monitoring by providing data collection capabilities at high spatial and temporal resolutions. However, limited flight endurance hinders their application. Inspired by natural perching behaviours, we propose a multi-modal aerial robot system that integrates tensile perching for energy conservation and a suspended actuated pod for data collection. The system consists of a quadrotor drone, a slewing ring mechanism allowing 360° tether rotation, and a streamlined pod with two ducted propellers connected via a tether. Winding and unwinding the tether allows the pod to move within the canopy, and activating the propellers allows the tether to be wrapped around branches for perching or disentangling. We experimentally determined the minimum counterweights required for stable perching under various conditions. Building on this, we devised and evaluated multiple perching and disentangling strategies. Comparisons of perching and disentangling manoeuvres demonstrate energy savings that could be further maximized with the use of the pod or tether winding. These approaches can reduce energy consumption to only 22\% and 1.5\%, respectively, compared to a drone disentangling manoeuvre. We also calculated the minimum idle time required by the proposed system after the system perching and motor shut down to save energy on a mission, which is 48.9\% of the operating time. Overall, the integrated system expands the operational capabilities and enhances the energy efficiency of aerial robots for long-term monitoring tasks.
无人机具有在森林叶冠研究和环境监测方面显著潜力,通过提供高空间和时间分辨率的数据采集能力。然而,有限的飞行 endurance 限制了它们的应用。受到自然栖息行为的启发,我们提出了一个多模态无人机系统,该系统结合了悬吊休息以节能和连接两个悬吊的螺旋桨以进行数据采集。系统由一个四旋翼无人机、一个360°拖动的环形机制和通过悬吊的流线型舱室组成,其中两个悬吊的螺旋桨通过悬吊连接。解开或收缩悬吊可以允许舱室在林冠中移动,激活螺旋桨可以允许悬吊围绕树枝进行栖息或解开。我们通过实验确定了在不同条件下稳定栖息所需的最小反重力。基于这一结果,我们设计了并评估了多种栖息和解开策略。栖息和解开动作的比较显示,使用悬吊或悬吊绕线可以实现更多的节能效果。这些方法可以将能源消耗降低至22%和1.5%。相比之下,无人机解开操作时的能源消耗。我们还计算了系统在系统栖息和电机关闭后最小空闲时间的最小值,该值为操作时间的48.9%。总体而言,集成系统扩展了无人机在长期监测任务中的操作能力和提高了其能源效率。
https://arxiv.org/abs/2403.01890
In this paper, we propose a cost-effective strategy for heterogeneous UAV swarm systems for cooperative aerial inspection. Unlike previous swarm inspection works, the proposed method does not rely on precise prior knowledge of the environment and can complete full 3D surface coverage of objects in any shape. In this work, agents are partitioned into teams, with each drone assign a different task, including mapping, exploration, and inspection. Task allocation is facilitated by assigning optimal inspection volumes to each team, following best-first rules. A voxel map-based representation of the environment is used for pathfinding, and a rule-based path-planning method is the core of this approach. We achieved the best performance in all challenging experiments with the proposed approach, surpassing all benchmark methods for similar tasks across multiple evaluation trials. The proposed method is open source at this https URL and used as the baseline of the Cooperative Aerial Robots Inspection Challenge at the 62nd IEEE Conference on Decision and Control 2023.
在本文中,我们提出了一个成本效益高的异构UAV swarm系统的合作式无人机检查策略。与之前的无人机检查工作不同,所提出的技术不依赖于精确的环境先验知识,并且可以对任何形状的物体进行完整的3D表面覆盖。在这篇论文中,代理被分为团队,每个无人机分配不同的任务,包括映射、探索和检查。任务分配通过为每个团队分配最优的检查体积,遵循最佳优先规则来促进。采用体素图为基础的环境表示用于路径规划,基于规则的路径规划方法是这种方法的核心。我们在所有具有挑战性的实验中使用所提出的技术取得了最佳性能,超过了所有类似任务的基准方法在多个评估试点。该方法目前是开源的,您可以通过以下链接访问:https://。在2023年62nd IEEE决策与控制会议期间,该方法被用作Cooperative Aerial Robots Inspection Challenge的基线。
https://arxiv.org/abs/2403.01225