This study delves into the flight behaviors of Budgerigars (Melopsittacus undulatus) to gain insights into their flight trajectories and movements. Using 3D reconstruction from stereo video camera recordings, we closely examine the velocity and acceleration patterns during three flight motion takeoff, flying and landing. The findings not only contribute to our understanding of bird behaviors but also hold significant implications for the advancement of algorithms in Unmanned Aerial Vehicles (UAVs). The research aims to bridge the gap between biological principles observed in birds and the application of these insights in developing more efficient and autonomous UAVs. In the context of the increasing use of drones, this study focuses on the biologically inspired principles drawn from bird behaviors, particularly during takeoff, flying and landing flight, to enhance UAV capabilities. The dataset created for this research sheds light on Budgerigars' takeoff, flying, and landing techniques, emphasizing their ability to control speed across different situations and surfaces. The study underscores the potential of incorporating these principles into UAV algorithms, addressing challenges related to short-range navigation, takeoff, flying, and landing.
本研究深入研究了斑鼩鼱(Melopsittacus undulatus)的飞行行为,以深入了解它们的飞行轨迹和运动。通过使用双摄像头立体视频相机记录的三维重建,我们详细研究了三种飞行动作起飞、飞行和着陆时的速度和加速度模式。研究结果不仅有助于我们了解鸟类的行为,还有助于促进无人驾驶飞行器(UAVs)算法的进步。该研究旨在将观察鸟类行为中发现的生物学原则与开发更高效和自主的UAV相结合。在无人机应用日益增加的背景下,本研究重点关注从鸟类行为中获得的生物启发原则,特别是起飞、飞行和着陆飞行,以提高UAV能力。这个研究数据集揭示了斑鼩鼱的起飞、飞行和着陆技术,强调了它们在不同情况和表面的控制速度的能力。本研究强调了将这些原则纳入UAV算法中的潜力,解决了与短距离导航、起飞、飞行和着陆相关的挑战。
https://arxiv.org/abs/2312.00597
Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on diverse fine-grained biology classification tasks, and find that BioCLIP consistently and substantially outperforms existing baselines (by 17% to 20% absolute). Intrinsic evaluation reveals that BioCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability. Our code, models and data will be made available at this https URL.
收集由各种相机收集的自然界图像,从无人机到智能手机,是生物信息的一个日益丰富的来源。在提取与科学和保护相关的生物信息方面,计算方法和工具的数量爆炸式增长,特别是计算机视觉。然而,大多数这些方法都是为特定任务而设计的,不具有可适应性或可扩展性,以回答新的问题、情境和数据。对于关于一般生物学的图像问题,需要一个及时的视觉模型。为了实现这一点,我们策划和发布了TreeOfLife-10M,这是生物图像中最大、最多样化的ML可用数据集。然后我们开发了BioCLIP,一个以生命树为根的树状生命模型,利用TreeOfLife-10M所捕获的生物多样性的丰富性质,即植物、动物和真菌图像的丰富性和多样性,以及丰富结构生物学知识的可用性。我们对我们的方法在多样化的精细生物分类任务上进行了严格的基准测试,发现BioCLIP consistently并且在很大程度上超过了现有基线(绝对优势为17%至20%)。内在评估揭示了BioCLIP已经学会了一种以生命树为根的分层表示,揭示了其强大的泛化能力。我们的代码、模型和数据将在此处https://www.xxx.com/ URL上提供。
https://arxiv.org/abs/2311.18803
Heterogeneous robots equipped with multi-modal sensors (e.g., UAV, wheeled and legged terrestrial robots) provide rich and complementary functions that may help human operators to accomplish complex tasks in unknown environments. However, seamlessly integrating heterogeneous agents and making them interact and collaborate still arise challenging issues. In this paper, we define a ROS 2 based software architecture that allows to build incarnated heterogeneous multi-agent systems (HMAS) in a generic way. We showcase its effectiveness through a scenario integrating aerial drones, quadruped robots, and human operators (see this https URL). In addition, agent spatial awareness in unknown outdoor environments is a critical step for realizing autonomous individual movements, interactions, and collaborations. Through intensive experimental measurements, RTK-GPS is shown to be a suitable solution for achieving the required locating accuracy.
异构机器人配备多种传感器(例如,无人机、轮式和腿式地面机器人)为人类操作员提供丰富和互补的功能,可能有助于在未知环境中完成复杂任务。然而,实现异构代理器的无缝集成以及使它们相互交互和合作仍然具有挑战性。在本文中,我们定义了一个基于ROS 2的软件架构,允许以通用方式构建化身异质多代理系统(HMAS)。我们通过一个场景展示了其有效性的(见此链接)。此外,对于实现自主个人运动、交互和合作,了解未知室外环境中的代理器空间感知至关重要。通过密集的实验测量,RTK-GPS被证明是一个合适的解决方案,以实现所需的定位精度。
https://arxiv.org/abs/2311.18394
Solving Hamilton-Jacobi-Isaacs (HJI) PDEs enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed machine learning has been adopted to address CoD in solving PDEs, this method falls short in learning discontinuous solutions due to its sampling nature, leading to poor safety performance of the resulting controllers in robotics applications where values are discontinuous due to state or other temporal logic constraints. In this study, we explore three potential solutions to this problem: (1) a hybrid learning method that uses both equilibrium demonstrations and the HJI PDE, (2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and (3) the epigraphical technique that lifts the value to a higher dimensional auxiliary state space where the value becomes continuous. Evaluations through 5D and 9D vehicle simulations and 13D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance.
求解Hamilton-Jacobi-Isaacs(HJI)偏微分方程(PDE)可以使二阶差分游戏中实现平衡反馈控制,但面临维数(CoD)的诅咒。尽管已经采用物理建模来解决PDE中的CoD,但这种方法由于其采样特性,在机器学习中无法学习离散解,导致机器人应用中结果控制器的性能差,因为值由于状态或其他时间逻辑约束而断断续续。在本文中,我们探讨了解决这个问题的一些潜在方法:(1)一种结合平衡演示和HJI PDE的中间学习方法;(2)一种值强化方法,在约束违规惩罚项上递增HJIs求解序列;(3)一种 epigraphical technique,将值提升到更高维的辅助状态空间中,使值变为连续。通过5D和9D车辆仿真和13D无人机仿真进行评估,发现混合方法在泛化能力和安全性方面优于其他方法。
https://arxiv.org/abs/2311.16520
Hand gestures play a significant role in human interactions where non-verbal intentions, thoughts and commands are conveyed. In Human-Robot Interaction (HRI), hand gestures offer a similar and efficient medium for conveying clear and rapid directives to a robotic agent. However, state-of-the-art vision-based methods for gesture recognition have been shown to be effective only up to a user-camera distance of seven meters. Such a short distance range limits practical HRI with, for example, service robots, search and rescue robots and drones. In this work, we address the Ultra-Range Gesture Recognition (URGR) problem by aiming for a recognition distance of up to 25 meters and in the context of HRI. We propose a novel deep-learning framework for URGR using solely a simple RGB camera. First, a novel super-resolution model termed HQ-Net is used to enhance the low-resolution image of the user. Then, we propose a novel URGR classifier termed Graph Vision Transformer (GViT) which takes the enhanced image as input. GViT combines the benefits of a Graph Convolutional Network (GCN) and a modified Vision Transformer (ViT). Evaluation of the proposed framework over diverse test data yields a high recognition rate of 98.1%. The framework has also exhibited superior performance compared to human recognition in ultra-range distances. With the framework, we analyze and demonstrate the performance of an autonomous quadruped robot directed by human gestures in complex ultra-range indoor and outdoor environments.
手势在人类交互中扮演着重要的角色,其中非语言意图、思想和命令是通过传递的。在人机交互(HRI)中,手势提供了向机器人代理传递清晰、迅速指令的类似且高效的媒介。然而,基于视觉的手势识别技术在用户相机距离为七米时已经达到了极限。这种短距离范围限制了例如服务机器人、搜救机器人和无人机等实际HRI应用。在这项工作中,我们通过追求识别距离达到25米,在HRI的背景下解决了Ultra-Range Gesture Recognition(URGR)问题。我们提出了一个仅使用简单RGB摄像头的URGR深度学习框架。首先,我们使用一种名为HQ-Net的超分辨率模型来增强用户的低分辨率图像。然后,我们提出了一种名为Graph Vision Transformer(GViT)的新URGR分类器,该分类器接受增强后的图像作为输入。GViT结合了图卷积网络(GCN)和一种修改后的视觉Transformer(ViT)的优势。对所提出的框架进行多样测试数据的评估,得出98.1%的高识别率。该框架在超远距离上还表现出与人识别相比的卓越性能。借助该框架,我们分析和展示了由人类手势控制的自主四足机器人在复杂室内和室外环境中的性能。
https://arxiv.org/abs/2311.15361
In this study, we present a novel paradigm for industrial robotic embodied agents, encapsulating an 'agent as cerebrum, controller as cerebellum' architecture. Our approach harnesses the power of Large Multimodal Models (LMMs) within an agent framework known as AeroAgent, tailored for drone technology in industrial settings. To facilitate seamless integration with robotic systems, we introduce ROSchain, a bespoke linkage framework connecting LMM-based agents to the Robot Operating System (ROS). We report findings from extensive empirical research, including simulated experiments on the Airgen and real-world case study, particularly in individual search and rescue operations. The results demonstrate AeroAgent's superior performance in comparison to existing Deep Reinforcement Learning (DRL)-based agents, highlighting the advantages of the embodied LMM in complex, real-world scenarios.
在这项研究中,我们提出了一个工业机器人协同智能体的新范式,称为“大脑作为代理,控制器作为小脑”架构。我们的方法利用了被称为AeroAgent的代理框架中Large Multimodal Models(LMMs)的力量,专门为工业环境中的无人机技术而定制。为了方便与机器人系统无缝集成,我们引入了ROSchain,一个专用的链接框架,将基于LMM的代理与机器人操作系统(ROS)连接起来。我们报告了广泛的实证研究结果,包括对Airgen和现实世界案例的模拟实验,尤其是在个体搜索和救援操作中的结果。结果表明,与基于DRL的现有 agent 相比,AeroAgent具有卓越的性能,突出了形式化LMM在复杂、现实世界场景中的优势。
https://arxiv.org/abs/2311.15033
The power requirements posed by the fifth-generation and beyond cellular networks are an important constraint in network deployment and require energy-efficient solutions. In this work, we propose a novel user load transfer approach using airborne base stations (BS), mounted on drones, for reliable and secure power redistribution across the micro-grid network comprising green small cell BSs. Depending on the user density and the availability of an aerial BS, the energy requirement of a cell with an energy deficit is accommodated by migrating the aerial BS from a high-energy to a low-energy cell. The proposed hybrid drone-based framework integrates long short-term memory with unique cost functions using an evolutionary neural network for drones and BSs, and efficiently manages energy and load redistribution. The proposed algorithm reduces power outages at BSs and maintains consistent throughput stability, thereby demonstrating its capability to boost the reliability and robustness of wireless communication systems.
第五代及更高移动网络所需的能量要求是一个重要的网络部署约束,需要能源高效的解决方案。在这项工作中,我们提出了一个新型的用户负载转移方法,使用安装在无人机上的基站(BS),实现对由绿色小基站组成的微电网网络的可靠和安全电力重新分配。根据用户密度和空站可用性的不同,移动基站从高能量状态迁移到低能量状态,以适应能量不足的细胞的能源需求。所提出的混合无人机框架将长期记忆与独特的成本函数相结合,使用进化的神经网络对无人机和基站进行管理,并有效地管理能量和负载重新分配。所提出的算法可以减少基站的事故停电,保持稳定的吞吐量,从而证明其提高无线通信系统可靠性和稳健性的能力。
https://arxiv.org/abs/2311.12944
Drone navigation through natural language commands remains a significant challenge due to the lack of publicly available multi-modal datasets and the intricate demands of fine-grained visual-text alignment. In response to this pressing need, we present a new human-computer interaction annotation benchmark called GeoText-1652, meticulously curated through a robust Large Language Model (LLM)-based data generation framework and the expertise of pre-trained vision models. This new dataset seamlessly extends the existing image dataset, \ie, University-1652, with spatial-aware text annotations, encompassing intricate image-text-bounding box associations. Besides, we introduce a new optimization objective to leverage fine-grained spatial associations, called blending spatial matching, for region-level spatial relation matching. Extensive experiments reveal that our approach maintains an exceptional recall rate under varying description complexities. This underscores the promising potential of our approach in elevating drone control and navigation through the seamless integration of natural language commands in real-world scenarios.
由于缺乏公开可用的多模态数据集以及细粒度视觉文本对齐的要求,通过自然语言命令进行无人机导航仍然是一个重要的挑战。为了应对这一紧迫需求,我们提出了一个名为GeoText-1652的新人机交互注释基准,通过一个强大的基于大型语言模型的数据生成框架和预训练视觉模型的专业知识精心挑选而成。这个新数据集无缝扩展了现有的图像数据集\ie,即大学-1652,包括空间注解,涵盖了复杂的图像-文本边界框关联。此外,我们还引入了一个新的优化目标,即混合空间匹配,用于区域级别空间关系匹配。大量的实验结果表明,我们的方法在不同的描述复杂性下保持了卓越的召回率。这进一步证明了我们在现实场景中通过将自然语言命令无缝集成提高无人机控制和导航的潜在可能性。
https://arxiv.org/abs/2311.12751
We study how to design learning-based adaptive controllers that enable fast and accurate online adaptation in changing environments. In these settings, learning is typically done during an initial (offline) design phase, where the vehicle is exposed to different environmental conditions and disturbances (e.g., a drone exposed to different winds) to collect training data. Our work is motivated by the observation that real-world disturbances fall into two categories: 1) those that can be directly monitored or controlled during training, which we call "manageable", and 2) those that cannot be directly measured or controlled (e.g., nominal model mismatch, air plate effects, and unpredictable wind), which we call "latent". Imprecise modeling of these effects can result in degraded control performance, particularly when latent disturbances continuously vary. This paper presents the Hierarchical Meta-learning-based Adaptive Controller (HMAC) to learn and adapt to such multi-source disturbances. Within HMAC, we develop two techniques: 1) Hierarchical Iterative Learning, which jointly trains representations to caption the various sources of disturbances, and 2) Smoothed Streaming Meta-Learning, which learns to capture the evolving structure of latent disturbances over time (in addition to standard meta-learning on the manageable disturbances). Experimental results demonstrate that HMAC exhibits more precise and rapid adaptation to multi-source disturbances than other adaptive controllers.
我们研究如何设计基于学习的自适应控制器,以实现快速和准确的在线适应变化环境。在这些环境中,学习通常在初始设计阶段进行,其中车辆暴露于不同的环境条件和干扰(例如,无人机受到不同风的影响)以收集训练数据。我们的工作源于观察到现实世界的干扰可以分为两类:1)可以在训练期间直接监测或控制的干扰,我们称之为“可管理”的干扰,2)无法直接监测或控制的干扰(例如,名义模型不匹配,空气板效应和不可预测的风),我们称之为“潜在”。这些干扰的粗略建模可能导致控制性能下降,特别是在潜在干扰持续变化的情况下。本文介绍了一种基于分层元学习的自适应控制器(HMAC)来学习和适应这些多源干扰。在HMAC中,我们开发了两种技术:1)分层迭代学习,该技术共同训练表示来描述各种干扰源,2)平滑流式元学习,该技术学会了捕捉潜在干扰随时间演变(除了可管理干扰的标准元学习)。实验结果表明,HMAC对多源干扰展现出比其他自适应控制器更精确和快速的适应性。
https://arxiv.org/abs/2311.12367
In this work, we use multi-view aerial images to reconstruct the geometry, lighting, and material of facades using neural signed distance fields (SDFs). Without the requirement of complex equipment, our method only takes simple RGB images captured by a drone as inputs to enable physically based and photorealistic novel-view rendering, relighting, and editing. However, a real-world facade usually has complex appearances ranging from diffuse rocks with subtle details to large-area glass windows with specular reflections, making it hard to attend to everything. As a result, previous methods can preserve the geometry details but fail to reconstruct smooth glass windows or verse vise. In order to address this challenge, we introduce three spatial- and semantic-adaptive optimization strategies, including a semantic regularization approach based on zero-shot segmentation techniques to improve material consistency, a frequency-aware geometry regularization to balance surface smoothness and details in different surfaces, and a visibility probe-based scheme to enable efficient modeling of the local lighting in large-scale outdoor environments. In addition, we capture a real-world facade aerial 3D scanning image set and corresponding point clouds for training and benchmarking. The experiment demonstrates the superior quality of our method on facade holistic inverse rendering, novel view synthesis, and scene editing compared to state-of-the-art baselines.
在这项工作中,我们使用多视角高空图像来重构建筑物的几何、照明和材质,利用神经签名距离场(SDF)进行。无需复杂的设备,我们的方法仅接受由无人机捕获的简单RGB图像作为输入,以实现基于物理和照片现实感的全新视角渲染、重新着色和编辑。然而,真实的建筑物通常具有复杂的表面外观,从扩散的岩石微细节到大面积的玻璃窗户的镜面反射,使得很难关注所有细节。因此,以前的方法可以保留几何细节,但无法重建平滑的玻璃窗户或斜纹。为了应对这一挑战,我们引入了三种空间和语义自适应优化策略,包括基于零点分割技术的语义正则化方法来改善材料一致性,频域意识的几何正则化来平衡不同表面的表面光滑度和细节,以及基于可见性探测的方案,以实现对大型户外环境中局部照明的有效建模。此外,我们还捕获了真实世界建筑物立面高分辨率3D扫描图像集以及相应点云,用于训练和评估。实验证明,与最先进的基准相比,我们的方法在建筑物的整体反向渲染、新视图合成和场景编辑方面具有卓越的质量。
https://arxiv.org/abs/2311.11825
Object detection in aerial images is a pivotal task for various earth observation applications, whereas current algorithms learn to detect only a pre-defined set of object categories demanding sufficient bounding-box annotated training samples and fail to detect novel object categories. In this paper, we consider open-vocabulary object detection (OVD) in aerial images that enables the characterization of new objects beyond training categories on the earth surface without annotating training images for these new categories. The performance of OVD depends on the quality of class-agnostic region proposals and pseudo-labels that can generalize well to novel object categories. To simultaneously generate high-quality proposals and pseudo-labels, we propose CastDet, a CLIP-activated student-teacher open-vocabulary object Detection framework. Our end-to-end framework within the student-teacher mechanism employs the CLIP model as an extra omniscient teacher of rich knowledge into the student-teacher self-learning process. By doing so, our approach boosts novel object proposals and classification. Furthermore, we design a dynamic label queue technique to maintain high-quality pseudo labels during batch training and mitigate label imbalance. We conduct extensive experiments on multiple existing aerial object detection datasets, which are set up for the OVD task. Experimental results demonstrate our CastDet achieving superior open-vocabulary detection performance, e.g., reaching 40.0 HM (Harmonic Mean), which outperforms previous methods Detic/ViLD by 26.9/21.1 on the VisDroneZSD dataset.
翻译:在无人机图像中进行目标检测是一个关键任务,对于各种地球观测应用具有重要作用。然而,目前的算法仅能检测预定义的对象类别,需要足够的边界框注释训练样本,却无法检测到新颖的对象类别。在本文中,我们考虑了在无人机图像中进行开放词汇目标检测(OVD),该方法能够刻画地球表面之外的新物体,而无需为这些新类别提供训练图像。OVD的性能取决于类无关区域建议和伪标签的质量,这些建议和伪标签对新颖物体类别的泛化能力很强。为了同时生成高质量的建议和伪标签,我们提出了CastDet,一种CLIP激活的学生-教师开放词汇目标检测框架。在我们的学生-教师机制中,CLIP模型作为额外的全知教师,将丰富的知识引入学生-教师自学习过程中。通过这种方式,我们的方法提高了新颖物体建议和分类。此外,我们还设计了一种动态标签队列技术,用于在批训练过程中维持高质量伪标签,并减轻标签不平衡。我们在多个现有无人机目标检测数据集上进行了广泛的实验,这些数据集是为OVD任务而设立的。实验结果表明,我们的CastDet实现卓越的开放词汇检测性能,例如达到40.0 HM(谐波均方),这比之前的Detic/ViLD方法高出26.9/21.1。
https://arxiv.org/abs/2311.11646
Neural networks based on convolutional operations have achieved remarkable results in the field of deep learning, but there are two inherent flaws in standard convolutional operations. On the one hand, the convolution operation be confined to a local window and cannot capture information from other locations, and its sampled shapes is fixed. On the other hand, the size of the convolutional kernel is fixed to k $\times$ k, which is a fixed square shape, and the number of parameters tends to grow squarely with size. It is obvious that the shape and size of targets are various in different datasets and at different locations. Convolutional kernels with fixed sample shapes and squares do not adapt well to changing targets. In response to the above questions, the Alterable Kernel Convolution (AKConv) is explored in this work, which gives the convolution kernel an arbitrary number of parameters and arbitrary sampled shapes to provide richer options for the trade-off between network overhead and performance. In AKConv, we define initial positions for convolutional kernels of arbitrary size by means of a new coordinate generation algorithm. To adapt to changes for targets, we introduce offsets to adjust the shape of the samples at each position. Moreover, we explore the effect of the neural network by using the AKConv with the same size and different initial sampled shapes. AKConv completes the process of efficient feature extraction by irregular convolutional operations and brings more exploration options for convolutional sampling shapes. Object detection experiments on representative datasets COCO2017, VOC 7+12 and VisDrone-DET2021 fully demonstrate the advantages of AKConv. AKConv can be used as a plug-and-play convolutional operation to replace convolutional operations to improve network performance. The code for the relevant tasks can be found at this https URL.
基于卷积操作的神经网络已经在深度学习领域取得了显著的成果,但标准卷积操作存在两个固有缺陷。一方面,卷积操作仅限于局部窗口,无法捕捉其他位置的信息,其采样形状是固定的;另一方面,卷积核的大小固定为k×k,这是一个固定的正方形形状,参数数量随大小呈平方增长。很明显,不同数据集和不同位置的目标形状和大小各不相同。具有固定采样形状和正方形的卷积核对变化目标适应性差。为了回答上述问题,本文研究了可调卷积核(AKConv),它为卷积核提供了任意数量参数和任意采样形状,从而为网络开销和性能的权衡提供更多选择。在AKConv中,我们通过一种新的坐标生成算法定义任意大小的卷积核的初始位置。为了适应变化的目标,我们引入了偏移量来调整每个位置的样本形状。此外,我们使用相同大小但不同初始采样形状的AKConv来研究神经网络的影响。AKConv通过不规则卷积操作完成了高效的特征提取,并为卷积采样形状带来了更多的探索选项。以COCO2017、VOC 7+12和VisDrone-DET2021等具有代表性的数据集为例的物体检测实验完全证明了AKConv的优势。AKConv可以作为一个可插拔的卷积操作来替换卷积操作以提高网络性能。相关任务代码可在此处找到:https://url.cn/
https://arxiv.org/abs/2311.11587
Many countries such as Saudi Arabia, Morocco and Tunisia are among the top exporters and consumers of palm date fruits. Date fruit production plays a major role in the economies of the date fruit exporting countries. Date fruits are susceptible to disease just like any fruit and early detection and intervention can end up saving the produce. However, with the vast farming lands, it is nearly impossible for farmers to observe date trees on a frequent basis for early disease detection. In addition, even with human observation the process is prone to human error and increases the date fruit cost. With the recent advances in computer vision, machine learning, drone technology, and other technologies; an integrated solution can be proposed for the automatic detection of date fruit disease. In this paper, a hybrid features based method with the standard classifiers is proposed based on the extraction of L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features for the early detection and classification of date fruit disease. A dataset was developed for this work consisting of 871 images divided into the following classes; Healthy date, Initial stage of disease, Malnourished date, and Parasite infected. The extracted features were input to common classifiers such as the Random Forest (RF), Multilayer Perceptron (MLP), Naïve Bayes (NB), and Fuzzy Decision Trees (FDT). The highest average accuracy was achieved when combining the L*a*b, Statistical, and DWT Features.
许多国家,如沙特阿拉伯、摩洛哥和突尼斯,是顶级开心果( Date palm)出口商和消费国。开心果( Date palm)的产量在开心果出口国的经济中发挥着重要作用。就像其他水果一样,开心果易受疾病影响,早早发现和干预可以挽救作物。然而,由于广阔的耕地,农民很难经常观察开心果树以进行早期疾病检测。此外,即使有人观察,过程也容易出错,导致开心果成本增加。随着计算机视觉、机器学习、无人机技术和其他技术的最近进步,可以提出一种自动检测开心果病害的集成解决方案。在本文中,基于L*a*b颜色特征、统计特征和离散小波变换(DWT)纹理特征提取的分类器方法被提出,用于早期检测和分类开心果病害。为了进行这项工作,开发了一个由871个图像组成的 dataset,分为以下类别:健康开心果、疾病最初阶段、营养不良开心果和寄生虫感染开心果。提取的特征输入到常见的分类器中,如随机森林(RF)、多层感知器(MLP)、朴素贝叶斯(NB)和模糊决策树(FDT)。在 L*a*b、统计和 DWT 特征的组合下,平均准确度最高。
https://arxiv.org/abs/2311.10365
Swarms of smart drones, with the support of charging technology, can provide completing sensing capabilities in Smart Cities, such as traffic monitoring and disaster response. Existing approaches, including distributed optimization and deep reinforcement learning (DRL), aim to coordinate drones to achieve cost-effective, high-quality navigation, sensing, and recharging. However, they have distinct challenges: short-term optimization struggles to provide sustained benefits, while long-term DRL lacks scalability, resilience, and flexibility. To bridge this gap, this paper introduces a new progressive approach that encompasses the planning and selection based on distributed optimization, as well as DRL-based flying direction scheduling. Extensive experiment with datasets generated from realisitic urban mobility demonstrate the outstanding performance of the proposed solution in traffic monitoring compared to three baseline methods.
无人机群充电技术的支持可以实现智能城市中完整的感测能力,如交通监测和灾害应对。现有方法,包括分布式优化和深度强化学习(DRL),旨在通过协调无人机实现成本效益高、质量导航、感测和充电。然而,它们都有各自的挑战:短期优化很难提供持续的好处,而长期 DRL 缺乏可扩展性、弹性和灵活性。为了弥合这一差距,本文提出了一种新的渐进方法,涵盖了基于分布式优化的规划和选择以及基于 DRL 的飞行方向调度。对从现实城市交通运动数据生成的数据集的深入实验证明,与三个基线方法相比,所提出的解决方案在交通监测方面具有出色的表现。
https://arxiv.org/abs/2311.09852
This work introduces a novel approach for Titan exploration based on soft morphing aerial robots leveraging the use of flexible adaptive materials. The controlled deformation of the multirotor arms, actuated by a combination of a pneumatic system and a tendon mechanism, provides the explorer robot with the ability to perform full-body perching and land on rocky, irregular, or uneven terrains, thus unlocking new exploration horizons. In addition, after landing, they can be used for efficient sampling as tendon-driven continuum manipulators, with the pneumatic system drawing in the samples. The proposed arms enable the drone to cover long distances in Titan's atmosphere efficiently, by directing rotor thrust without rotating the body, reducing the aerodynamic drag. Given that the exploration concept is envisioned as a rotorcraft planetary lander, the robot's folding features enable over a 30$\%$ reduction in the hypersonic aeroshell's diameter. Building on this folding capability, the arms can morph partially in flight to navigate tight spaces. As for propulsion, the rotor design, justified through CFD simulations, utilizes a ducted fan configuration tailored for Titan's high Reynolds numbers. The rotors are integrated within the robot's deformable materials, facilitating smooth interactions with the environment. The research spotlights exploration simulations in the Gazebo environment, focusing on the Sotra-Patera cryovolcano region, a location with potential to clarify Titan's unique methane cycle and its Earth-like features. This work addresses one of the primary challenges of the concept by testing the behavior of small-scale deformable arms under conditions mimicking those of Titan. Groundbreaking experiments with liquid nitrogen at cryogenic temperatures were conducted on various materials, with Teflon (PTFE) at low infill rates (15-30%) emerging as a promising option.
这项工作提出了一种名为Titan探索的新方法,该方法基于柔性变形空气机器人,利用柔性自适应材料的控制变形。由气动系统和刚性机制的组合产生的多旋翼臂的变形控制,使探险者机器人能够执行全身着陆和在不平整、崎岖或不平的岩石地面上着陆,从而解锁新的探险视野。此外,在着陆后,它们还可以用于有效的采样,作为刚性驱动连续操纵器。所提出的手臂使无人机能够在Titan的大气层中高效地覆盖长距离,通过不旋转身体而控制旋翼推力,从而减少气动阻力。鉴于探险概念被设想为多旋翼行星着陆器,机器人的折叠特性允许在超过30%的哈密顿球形气动外壳直径上实现减半。基于这种折叠能力,手臂可以在飞行中部分变形以在狭小空间中导航。关于推进力,通过CFD模拟来证明其合理性,所提出的旋翼设计采用适应Titan高雷诺数的气动扇配置。旋翼集成在机器人的可变形材料中,从而促进与环境的平稳互动。这项研究关注于在Gazebo环境中进行的探索模拟,重点关注Sotra-Patera冰火火山地区,该地区可能有助于澄清Titan独特的甲烷周期及其地球般的特征。通过在低温下进行液氮实验,得出了Teflon(PTFE)在低填充率(15-30%)下具有潜在前景的结论。
https://arxiv.org/abs/2311.08952
Deep Neural Networks (DNNs) are extremely computationally demanding, which presents a large barrier to their deployment on resource-constrained devices. Since such devices are where many emerging deep learning applications lie (e.g., drones, vision-based medical technology), significant bodies of work from both the machine learning and systems communities have attempted to provide optimizations to accelerate DNNs. To help unify these two perspectives, in this paper we combine machine learning and systems techniques within the Deep Learning Acceleration Stack (DLAS), and demonstrate how these layers can be tightly dependent on each other with an across-stack perturbation study. We evaluate the impact on accuracy and inference time when varying different parameters of DLAS across two datasets, seven popular DNN architectures, four DNN compression techniques, three algorithmic primitives with sparse and dense variants, untuned and auto-scheduled code generation, and four hardware platforms. Our evaluation highlights how perturbations across DLAS parameters can cause significant variation and across-stack interactions. The highest level observation from our evaluation is that the model size, accuracy, and inference time are not guaranteed to be correlated. Overall we make 13 key observations, including that speedups provided by compression techniques are very hardware dependent, and that compiler auto-tuning can significantly alter what the best algorithm to use for a given configuration is. With DLAS, we aim to provide a reference framework to aid machine learning and systems practitioners in reasoning about the context in which their respective DNN acceleration solutions exist in. With our evaluation strongly motivating the need for co-design, we believe that DLAS can be a valuable concept for exploring the next generation of co-designed accelerated deep learning solutions.
深度神经网络(DNNs)在计算上非常具有挑战性,这使得将它们应用于资源受限设备上具有较大的障碍。因为这些设备是许多新兴深度学习应用所在的地方(例如,无人机,基于视觉的医疗技术),机器学习和系统社区的大多数工作都试图通过优化来加速DNN。为了帮助统一这两个观点,本文将DLAS中的机器学习和系统技术结合起来,通过跨层错位研究来展示这些层如何紧密相关。我们在两个数据集,七个流行的DNN架构,四种DNN压缩技术和三个具有稀疏和密集变体的算法原语以及未经调整和自调整代码生成的四个硬件平台上进行了实验。我们的评估显示,参数错位对DLAS的影响可能会导致相当大的变化和层间交互。我们得出的评估结论是,模型大小,准确度和推理时间并不一定相关。总体而言,我们提出了13个关键观察结果,包括压缩技术提供的速度提升对硬件有非常强的依赖性,以及编译自适应调整可以显著改变为特定配置下最适合的算法的观察结果。通过DLAS,我们的目标是为机器学习和系统实践提供一个参考框架,以帮助他们在其各自DNN加速解决方案所处的上下文中进行推理。通过我们的评估,我们强烈 motivating 需要进行共同设计,我们认为DLAS可以成为探索下一代共同设计的加速深度学习解决方案的有价值的概念。
https://arxiv.org/abs/2311.08909
Perception contracts provide a method for evaluating safety of control systems that use machine learning for perception. A perception contract is a specification for testing the ML components, and it gives a method for proving end-to-end system-level safety requirements. The feasibility of contract-based testing and assurance was established earlier in the context of straight lane keeping: a 3-dimensional system with relatively simple dynamics. This paper presents the analysis of two 6 and 12-dimensional flight control systems that use multi-stage, heterogeneous, ML-enabled perception. The paper advances methodology by introducing an algorithm for constructing data and requirement guided refinement of perception contracts (DaRePC). The resulting analysis provides testable contracts which establish the state and environment conditions under which an aircraft can safety touchdown on the runway and a drone can safely pass through a sequence of gates. It can also discover conditions (e.g., low-horizon sun) that can possibly violate the safety of the vision-based control system.
感知合约提供了一种评估使用机器学习进行感知的控制系统的安全性方法。感知合约是用于测试机器学习组件的规格,并提供了证明端到端系统级安全需求的证明方法。在直道保持背景下,合同测试和验证的可行性已经得到确立:一个相对简单的动态系统的3D系统。本文介绍了两个使用多阶段、异构型和机器学习控制的6和12维度飞行控制系统。本文通过引入一个指导感知合约数据和需求引导优化的算法,推动了测试方法的进步。分析结果提供了可测试的合同,确定了飞机在跑道上安全着陆以及无人机通过一系列门的安全通道。还可以发现可能违反基于视觉控制的系统安全性的条件(例如低视野太阳)。
https://arxiv.org/abs/2311.08652
The integration of machine learning (ML) into cyber-physical systems (CPS) offers significant benefits, including enhanced efficiency, predictive capabilities, real-time responsiveness, and the enabling of autonomous operations. This convergence has accelerated the development and deployment of a range of real-world applications, such as autonomous vehicles, delivery drones, service robots, and telemedicine procedures. However, the software development life cycle (SDLC) for AI-infused CPS diverges significantly from traditional approaches, featuring data and learning as two critical components. Existing verification and validation techniques are often inadequate for these new paradigms. In this study, we pinpoint the main challenges in ensuring formal safety for learningenabled CPS.We begin by examining testing as the most pragmatic method for verification and validation, summarizing the current state-of-the-art methodologies. Recognizing the limitations in current testing approaches to provide formal safety guarantees, we propose a roadmap to transition from foundational probabilistic testing to a more rigorous approach capable of delivering formal assurance.
将机器学习(ML)融入 cyber-physical systems(CPS)带来了显著的好处,包括提高效率、预测能力、实时响应和实现自主操作。这一融合加速了各种真实应用的发展和部署,如自动驾驶汽车、送货无人机、服务机器人和远程医疗程序。然而,AI-infused CPS 的软件开发生命周期(SDLC)与传统方法存在显著差异,其以数据和学习为两个关键组件。现有验证和验证技术通常不足以应对这些新范式。在本文中,我们重点探讨了确保学习enabled CPS 形式安全的主要挑战。我们首先探讨了测试作为验证和验证的最实用方法,并总结了当前最先进的方法论。认识到当前测试方法在提供形式安全保证方面的局限性,我们提出了一个从基础概率测试向更严谨方法转型的路线图,以实现形式保证。
https://arxiv.org/abs/2311.07377
Socially aware robot navigation is gaining popularity with the increase in delivery and assistive robots. The research is further fueled by a need for socially aware navigation skills in autonomous vehicles to move safely and appropriately in spaces shared with humans. Although most of these are ground robots, drones are also entering the field. In this paper, we present a literature survey of the works on socially aware robot navigation in the past 10 years. We propose four different faceted taxonomies to navigate the literature and examine the field from four different perspectives. Through the taxonomic review, we discuss the current research directions and the extending scope of applications in various domains. Further, we put forward a list of current research opportunities and present a discussion on possible future challenges that are likely to emerge in the field.
随着配送和辅助机器人的增加,社会意识机器人导航正在逐渐受到欢迎。社会意识导航技能在自主车辆中至关重要,以便安全适当地在共享空间中移动。尽管大多数这些机器人都是地面机器人,但无人机也进入了这个领域。在本文中,我们对过去10年内有关社会意识机器人导航的研究作品进行了文献调查。我们提出了四个不同的分类法来梳理文献,并从四个不同的角度研究该领域。通过分类回顾,我们讨论了当前的研究方向以及各种领域的扩展应用前景。此外,我们提出了当前的研究机会,并探讨了该领域可能出现的未来挑战。
https://arxiv.org/abs/2311.06922
Multi-Object Tracking is one of the most important technologies in maritime computer vision. Our solution tries to explore Multi-Object Tracking in maritime Unmanned Aerial vehicles (UAVs) and Unmanned Surface Vehicles (USVs) usage scenarios. Most of the current Multi-Object Tracking algorithms require complex association strategies and association information (2D location and motion, 3D motion, 3D depth, 2D appearance) to achieve better performance, which makes the entire tracking system extremely complex and heavy. At the same time, most of the current Multi-Object Tracking algorithms still require video annotation data which is costly to obtain for training. Our solution tries to explore Multi-Object Tracking in a completely unsupervised way. The scheme accomplishes instance representation learning by using self-supervision on ImageNet. Then, by cooperating with high-quality detectors, the multi-target tracking task can be completed simply and efficiently. The scheme achieved top 3 performance on both UAV-based Multi-Object Tracking with Reidentification and USV-based Multi-Object Tracking benchmarks and the solution won the championship in many multiple Multi-Object Tracking competitions. such as BDD100K MOT,MOTS, Waymo 2D MOT
多目标跟踪是航运计算机视觉中最重要的技术之一。我们的解决方案试图探讨在无人船(UAV)和无人水下车辆(USV)使用场景中进行多目标跟踪。目前大多数多目标跟踪算法需要复杂的关联策略和关联信息(2D位置和运动,3D运动,3D深度,2D外观)以实现更好的性能,这使得整个跟踪系统极其复杂和沉重。同时,大多数当前的多目标跟踪算法仍然需要视频注释数据,这对于训练来说成本较高。我们的解决方案试图以一种完全无监督的方式探索多目标跟踪。该方案通过在ImageNet上使用自监督图像分割来实现实例表示学习。然后,通过与高质量检测器合作,多目标跟踪任务可以轻松高效地完成。该方案在基于UAV的多个目标跟踪基准测试和基于USV的多目标跟踪基准测试中获得了并列第3名的好成绩,并且该解决方案在许多Multi-Object Tracking竞赛中获得了冠军,如BDD100K MOT,MOTS,Waymo 2D MOT等。
https://arxiv.org/abs/2311.07616