Autonomous driving systems require a quick and robust perception of the nearby environment to carry out their routines effectively. With the aim to avoid collisions and drive safely, autonomous driving systems rely heavily on object detection. However, 2D object detections alone are insufficient; more information, such as relative velocity and distance, is required for safer planning. Monocular 3D object detectors try to solve this problem by directly predicting 3D bounding boxes and object velocities given a camera image. Recent research estimates time-to-contact in a per-pixel manner and suggests that it is more effective measure than velocity and depth combined. However, per-pixel time-to-contact requires object detection to serve its purpose effectively and hence increases overall computational requirements as two different models need to run. To address this issue, we propose per-object time-to-contact estimation by extending object detection models to additionally predict the time-to-contact attribute for each object. We compare our proposed approach with existing time-to-contact methods and provide benchmarking results on well-known datasets. Our proposed approach achieves higher precision compared to prior art while using a single image.
自动驾驶系统需要对周围环境进行快速且可靠的感知,以有效执行其任务。为了避免碰撞并安全驾驶,自动驾驶系统 reliance heavily on object detection。然而,单独的2D物体检测是不够的;为了进行更安全的规划,还需要更多的信息,如相对速度和距离。单目3D物体检测器试图通过直接预测相机图像中的3D边界框和物体速度来解决这个问题。最近的研究以每像素方式估计了时间到达,并建议这是比速度和深度联合更有效的测量方法。然而,每像素时间到达需要物体检测器实现其目的,因此增加了整体计算需求。为了应对这个问题,我们提出了一种通过扩展物体检测模型来预测每个物体的时间到达的方法。我们比较了我们提出的方法与现有时间到达方法,并在知名数据集上进行了基准测试。我们提出的方法在单张图片上实现更高精度的时间到达,同时使用了一个物体。
https://arxiv.org/abs/2405.07698
Current multi-modality driving frameworks normally fuse representation by utilizing attention between single-modality branches. However, the existing networks still suppress the driving performance as the Image and LiDAR branches are independent and lack a unified observation representation. Thus, this paper proposes MaskFuser, which tokenizes various modalities into a unified semantic feature space and provides a joint representation for further behavior cloning in driving contexts. Given the unified token representation, MaskFuser is the first work to introduce cross-modality masked auto-encoder training. The masked training enhances the fusion representation by reconstruction on masked tokens. Architecturally, a hybrid-fusion network is proposed to combine advantages from both early and late fusion: For the early fusion stage, modalities are fused by performing monotonic-to-BEV translation attention between branches; Late fusion is performed by tokenizing various modalities into a unified token space with shared encoding on it. MaskFuser respectively reaches a driving score of 49.05 and route completion of 92.85% on the CARLA LongSet6 benchmark evaluation, which improves the best of previous baselines by 1.74 and 3.21%. The introduced masked fusion increases driving stability under damaged sensory inputs. MaskFuser outperforms the best of previous baselines on driving score by 6.55 (27.8%), 1.53 (13.8%), 1.57 (30.9%), respectively given sensory masking ratios 25%, 50%, and 75%.
目前的多模态驾驶框架通常通过在单模态分支之间利用注意力来融合表示。然而,现有的网络仍然会抑制在 Image 和 LiDAR 分支之间缺乏统一观察表示的情况下驱动性能。因此,本文提出了 MaskFuser,它将各种模块统一到一个语义特征空间中,并为进一步在驾驶场景中行为复制提供联合表示。在统一 token 表示下,MaskFuser 是第一个引入跨模态掩码自编码器训练的工作。通过在掩码标记的token上进行重建,遮码训练增强了融合表示。架构上,一种混合融合网络被提出,结合了早期和晚期融合的优点:在早期融合阶段,通过在分支之间执行单调到 BEV 转换注意力和;在晚期融合阶段,将各种模块元分类到统一的标记空间,并在其中共享编码。MaskFuser 在CARLA LongSet6基准评估中的驾驶分数为49.05,路线完成率为92.85%,比最先进的基线提高了1.74和3.21%。引入的遮码融合在受损的感知输入下提高了驾驶稳定性。MaskFuser在驾驶分数上比最先进的基线提高了6.55(27.8%),1.53(13.8%),1.57(30.9%)倍。在感官掩码比率为25%,50%,75%时,MaskFuser的表现也分别比最先进的基线提高了2.67%,3.48%,和3.92%。
https://arxiv.org/abs/2405.07573
Robust 3D object detection remains a pivotal concern in the domain of autonomous field robotics. Despite notable enhancements in detection accuracy across standard datasets, real-world urban environments, characterized by their unstructured and dynamic nature, frequently precipitate an elevated incidence of false positives, thereby undermining the reliability of existing detection paradigms. In this context, our study introduces an advanced post-processing algorithm that modulates detection thresholds dynamically relative to the distance from the ego object. Traditional perception systems typically utilize a uniform threshold, which often leads to decreased efficacy in detecting distant objects. In contrast, our proposed methodology employs a Neural Network with a self-adaptive thresholding mechanism that significantly attenuates false negatives while concurrently diminishing false positives, particularly in complex urban settings. Empirical results substantiate that our algorithm not only augments the performance of 3D object detection models in diverse urban and adverse weather scenarios but also establishes a new benchmark for adaptive thresholding techniques in field robotics.
稳健的3D物体检测在自主领域机器人领域仍然是一个关键问题。尽管在标准数据集和现实世界城市环境中检测准确度的提升很大,但由于这些环境的特点是动态和无结构的,因此经常导致误检率的上升,从而削弱了现有检测范式的可靠性。在这种情况下,我们的研究引入了一种高级的后处理算法,该算法相对于自车对象距离自适应地调整检测阈值。传统的感知系统通常使用统一阈值,这往往导致在检测远处物体时效力下降。相反,我们提出的方法采用了一个具有自适应阈值机制的神经网络,该机制可以显著减弱误检率,同时降低误检率,特别是在复杂的城市环境中。实验结果证实,我们的算法不仅增强了各种城市和恶劣天气场景中3D物体检测模型的性能,而且为场机器人技术中的自适应阈值技术树立了新的基准。
https://arxiv.org/abs/2405.07479
Blackgrass (Alopecurus myosuroides) is a competitive weed that has wide-ranging impacts on food security by reducing crop yields and increasing cultivation costs. In addition to the financial burden on agriculture, the application of herbicides as a preventive to blackgrass can negatively affect access to clean water and sanitation. The WeedScout project introduces a Real-Rime Autonomous Black-Grass Classification and Mapping (RT-ABGCM), a cutting-edge solution tailored for real-time detection of blackgrass, for precision weed management practices. Leveraging Artificial Intelligence (AI) algorithms, the system processes live image feeds, infers blackgrass density, and covers two stages of maturation. The research investigates the deployment of You Only Look Once (YOLO) models, specifically the streamlined YOLOv8 and YOLO-NAS, accelerated at the edge with the NVIDIA Jetson Nano (NJN). By optimising inference speed and model performance, the project advances the integration of AI into agricultural practices, offering potential solutions to challenges such as herbicide resistance and environmental impact. Additionally, two datasets and model weights are made available to the research community, facilitating further advancements in weed detection and precision farming technologies.
黑grass(Alopecurus myosuroides)是一种竞争性杂草,它通过减少产量和增加种植成本对粮食安全产生广泛影响。除了对农业的经济负担外,使用除草剂作为预防黑grass的应用可能会影响清洁水和卫生的访问。WeedScout项目引入了实时黑grass分类和映射(RT-ABGCM),这是一种专为实时检测黑grass而设计的先进解决方案,用于精确杂草管理。通过利用人工智能(AI)算法,该系统处理实时图像数据,推断黑grass密度,并覆盖两个生长阶段。研究探讨了部署You Only Look Once(YOLO)模型的应用,特别是经过优化的YOLOv8和YOLO-NAS,在NVIDIA Jetson Nano(NJN)边缘加速。通过优化推理速度和模型性能,该项目将人工智能融入农业实践,为应对除草剂耐药性和环境问题提供潜在解决方案。此外,为研究社区提供了两个数据集和模型权重,促进进一步杂草检测和精确农业技术的发展。
https://arxiv.org/abs/2405.07349
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 7 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at this https URL.
尽管在图像识别领域,进展迅速,但处理高分辨率图像仍然是一个计算上的挑战。然而,这种处理对于从自动驾驶到医学影像分析等各个领域提取详细的目标洞察至关重要。我们的研究旨在通过利用基于记忆高效的图像补丁处理来减轻这些挑战。它结合了全局上下文表示和局部补丁信息,实现了对图像内容的全面理解。与传统训练方法受到内存限制不同,我们的方法可以训练超高清图像。我们通过在分类、目标检测和分割等7个不同的基准测试中取得卓越的性能,证明了我们方法的优越性。值得注意的是,与资源受限的设备(如Jetson Nano)相比,所提出的方法在性能上同样具有优势。我们的代码可以从以下链接获取。
https://arxiv.org/abs/2405.07166
Pedestrian trajectory prediction plays a pivotal role in the realms of autonomous driving and smart cities. Despite extensive prior research employing sequence and generative models, the unpredictable nature of pedestrians, influenced by their social interactions and individual preferences, presents challenges marked by uncertainty and multimodality. In response, we propose the Energy Plan Denoising (EPD) model for stochastic trajectory prediction. EPD initially provides a coarse estimation of the distribution of future trajectories, termed the Plan, utilizing the Langevin Energy Model. Subsequently, it refines this estimation through denoising via the Probabilistic Diffusion Model. By initiating denoising with the Plan, EPD effectively reduces the need for iterative steps, thereby enhancing efficiency. Furthermore, EPD differs from conventional approaches by modeling the distribution of trajectories instead of individual trajectories. This allows for the explicit modeling of pedestrian intrinsic uncertainties and eliminates the need for multiple denoising operations. A single denoising operation produces a distribution from which multiple samples can be drawn, significantly enhancing efficiency. Moreover, EPD's fine-tuning of the Plan contributes to improved model performance. We validate EPD on two publicly available datasets, where it achieves state-of-the-art results. Additionally, ablation experiments underscore the contributions of individual modules, affirming the efficacy of the proposed approach.
行人轨迹预测在自动驾驶和智能城市领域扮演着关键角色。尽管之前的研究广泛采用了序列和生成模型,但行人的不可预测性,受到其社会互动和个人偏好的影响,提出了不确定性特征和多模态问题。为了应对这一挑战,我们提出了能量规划去噪(EPD)模型来进行随机轨迹预测。EPD最初利用Langevin能量模型提供未来轨迹的粗略估计,然后通过概率扩散模型进行去噪优化。通过以计划为基础进行去噪,EPD有效地减少了迭代步骤的需要,从而提高了效率。此外,EPD与传统方法的不同之处在于它将轨迹分布建模为单个分布,而不是个体轨迹。这使得它可以明确地建模行人固有不确定性,并避免了多个去噪操作的需要。通过单次去噪操作产生一个分布,可以从中抽取多个样本,从而大大提高效率。此外,EPD对计划的微调有助于提高模型性能。我们在两个公开可用的数据集上验证了EPD,并实现了最先进的性能。此外,消融实验证实了所提出的方案的有效性。
https://arxiv.org/abs/2405.07164
The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic intent and uncertainty, thereby limiting their effectiveness. We present the Denoised Endpoint Distribution model for trajectory prediction, which distinctively models agents' spatio-temporal features alongside their intrinsic intentions and uncertainties. By employing Diffusion and Transformer models to focus on agent endpoints rather than entire trajectories, our approach significantly reduces model complexity and enhances performance through endpoint information. Our experiments on open datasets, coupled with comparison and ablation studies, demonstrate our model's efficacy and the importance of its components. This approach advances trajectory prediction in high-speed scenarios and lays groundwork for future developments.
机器人或道路交通代理商对高速运动的探索对自动驾驶和导航至关重要。在高速运动中,轨迹预测需要考虑历史特征和周围实体的交互,这是在低速环境中不那么凸显的复杂性。先前的方法评估了代理的时空动态,但通常忽视了固有意图和不确定性,从而限制了它们的有效性。我们提出了用于轨迹预测的模糊端点分布模型,该模型独特地建模了代理的时空特征以及其固有意图和不确定性。通过使用扩散和Transformer模型关注代理的端点而不是整个轨迹,我们的方法显著减少了模型复杂度,并通过端点信息提高了性能。在公开数据集上进行实验,并与消融研究和比较研究相结合,证明了我们的模型的有效性和其组成部分的重要性。这种方法在高速场景中促进了轨迹预测的发展,并为未来的发展奠定了基础。
https://arxiv.org/abs/2405.07041
To address the limitations inherent to conventional automated harvesting robots specifically their suboptimal success rates and risk of crop damage, we design a novel bot named AHPPEBot which is capable of autonomous harvesting based on crop phenotyping and pose estimation. Specifically, In phenotyping, the detection, association, and maturity estimation of tomato trusses and individual fruits are accomplished through a multi-task YOLOv5 model coupled with a detection-based adaptive DBScan clustering algorithm. In pose estimation, we employ a deep learning model to predict seven semantic keypoints on the pedicel. These keypoints assist in the robot's path planning, minimize target contact, and facilitate the use of our specialized end effector for harvesting. In autonomous tomato harvesting experiments conducted in commercial greenhouses, our proposed robot achieved a harvesting success rate of 86.67%, with an average successful harvest time of 32.46 s, showcasing its continuous and robust harvesting capabilities. The result underscores the potential of harvesting robots to bridge the labor gap in agriculture.
为了克服传统自动收获机器人的局限性,特别是其低成功率和作物受损风险,我们设计了一种名为AHPPEBot的新型机器人,它能够基于作物表型和姿态估计进行自主收获。具体来说,在表型识别中,通过结合多任务YOLOv5模型和基于检测的适应性DBSCAN聚类算法,我们实现了对番茄果实的检测、关联和成熟度的估计。在姿态估计中,我们采用深度学习模型预测 pedicel 上七个语义关键点。这些关键点有助于机器人路径规划,最小化目标接触,并促进我们专用收割器的使用。在商业温室中进行的自走式番茄收获实验中,与传统的收获机器人相比,我们提出的机器人收获成功率为86.67%,平均成功收获时间为32.46秒,展示了其连续和可靠的收获能力。这种结果强调了收获机器人在农业中解决劳动力短缺问题的潜力。
https://arxiv.org/abs/2405.06959
Human being and different species of animals having the skills to gather, transferring knowledge, processing, fine-tune and generating information throughout their lifetime. The ability of learning throughout their lifespan is referred as continuous learning which is using neurocognition mechanism. Consequently, in real world computational system of incremental learning autonomous agents also needs such continuous learning mechanism which provide retrieval of information and long-term memory consolidation. However, the main challenge in artificial intelligence is that the incremental learning of the autonomous agent when new data confronted. In such scenarios, the main concern is catastrophic forgetting(CF), i.e., while learning the sequentially, neural network underfits the old data when it confronted with new data. To tackle this CF problem many numerous studied have been proposed, however it is very difficult to compare their performance due to dissimilarity in their evaluation mechanism. Here we focus on the comparison of all algorithms which are having similar type of evaluation mechanism. Here we are comparing three types of incremental learning methods: (1) Exemplar based methods, (2) Memory based methods, and (3) Network based method. In this survey paper, methodology oriented study for catastrophic forgetting in incremental deep neural network is addressed. Furthermore, it contains the mathematical overview of impact-full methods which can be help researchers to deal with CF.
人类和其他动物在其一生中具有收集、传递知识、处理、微调和生成信息的能力。这种能力在整个生命中进行学习被称为连续学习,利用神经认知机制。因此,在现实世界的自增学习自主代理也需要这种连续学习机制,以提供信息检索和长期记忆巩固。然而,人工智能的一个主要挑战是,当代理面对新的数据时,自增学习的递归问题。在这些问题中,主要关注的是灾难性遗忘(CF),即在依次学习的过程中,神经网络在遇到新数据时,过拟合旧数据。为解决这一CF问题,已经提出了许多研究,但是由于它们的评估机制不同,很难比较它们的性能。因此,本文将重点比较具有相似评估机制的所有算法的性能。本文将比较三种自增学习方法:(1)示例方法, (2)记忆方法,和(3)网络方法。本研究旨在解决灾难性遗忘在自增深度神经网络中的问题,并提供了对有影响力的方法的数学概述,以帮助研究人员解决CF问题。
https://arxiv.org/abs/2405.08015
While our understanding of fairness in machine learning has significantly progressed, our understanding of fairness in reinforcement learning (RL) remains nascent. Most of the attention has been on fairness in one-shot classification tasks; however, real-world, RL-enabled systems (e.g., autonomous vehicles) are much more complicated in that agents operate in dynamic environments over a long period of time. To ensure the responsible development and deployment of these systems, we must better understand fairness in RL. In this paper, we survey the literature to provide the most up-to-date snapshot of the frontiers of fairness in RL. We start by reviewing where fairness considerations can arise in RL, then discuss the various definitions of fairness in RL that have been put forth thus far. We continue to highlight the methodologies researchers used to implement fairness in single- and multi-agent RL systems before showcasing the distinct application domains that fair RL has been investigated in. Finally, we critically examine gaps in the literature, such as understanding fairness in the context of RLHF, that still need to be addressed in future work to truly operationalize fair RL in real-world systems.
虽然我们对机器学习中的公平性理解已经有了很大的进步,但在强化学习(RL)中的公平性理解仍然非常初浅。大部分关注点都集中在单次分类任务中的公平性上;然而,现实世界中的RL enabled系统(例如,自动驾驶车辆)在较长的时期内操作在动态环境中更为复杂。为了确保这些系统的负责任发展和部署,我们必须更好地理解RL中的公平性。在本文中,我们调查了文献,为RL中的公平性提供了最最新的前沿。我们首先回顾了在RL中公平性考虑可能出现的地方,然后讨论了迄今为止在RL中提出的公平性的各种定义。我们继续强调研究人员在实现单人和多机器人RL系统中的公平性时所采用的方法,然后展示了公平RL在研究中所涉及的独特应用领域。最后,我们对这些文献中仍需解决的关于RLHF公平性理解的空白进行了批判性审查,以确保在未来的工作中真正实现公平RL在现实世界系统中的操作。
https://arxiv.org/abs/2405.06909
Robot-assisted bite acquisition involves picking up food items that vary in their shape, compliance, size, and texture. A fully autonomous strategy for bite acquisition is unlikely to efficiently generalize to this wide variety of food items. We propose to leverage the presence of the care recipient to provide feedback when the system encounters novel food items. However, repeatedly asking for help imposes cognitive workload on the user. In this work, we formulate human-in-the-loop bite acquisition within a contextual bandit framework and propose a novel method, LinUCB-QG, that selectively asks for help. This method leverages a predictive model of cognitive workload in response to different types and timings of queries, learned using data from 89 participants collected in an online user study. We demonstrate that this method enhances the balance between task performance and cognitive workload compared to autonomous and querying baselines, through experiments in a food dataset-based simulator and a user study with 18 participants without mobility limitations.
机器人辅助抓取涉及捡取形状、合规、尺寸和质地的不同食品项目。完全自主的抓取策略可能不会有效地泛化到这种广泛的食品项目中。我们提出了一种利用受试者存在来提供反馈的方法,当系统遇到新颖的食品项目时。然而,反复要求帮助会对用户产生认知负担。在这项工作中,我们将人机交互的抓取融入到上下文博弈框架中,并提出了一种新方法LinUCB-QG,该方法选择性地要求帮助。这种方法通过收集在线用户研究中的89个参与者的数据来学习预测认知工作量的不同类型和时间,从而增强任务绩效和认知工作负荷之间的平衡。我们在基于食品数据集的模拟器和无移动限制的用户研究中证明了这种方法相比于自主和查询基线具有优势。
https://arxiv.org/abs/2405.06908
This paper proposes a novel task named "3D part grouping". Suppose there is a mixed set containing scattered parts from various shapes. This task requires algorithms to find out every possible combination among all the parts. To address this challenge, we propose the so called Gradient Field-based Auto-Regressive Sampling framework (G-FARS) tailored specifically for the 3D part grouping task. In our framework, we design a gradient-field-based selection graph neural network (GNN) to learn the gradients of a log conditional probability density in terms of part selection, where the condition is the given mixed part set. This innovative approach, implemented through the gradient-field-based selection GNN, effectively captures complex relationships among all the parts in the input. Upon completion of the training process, our framework becomes capable of autonomously grouping 3D parts by iteratively selecting them from the mixed part set, leveraging the knowledge acquired by the trained gradient-field-based selection GNN. Our code is available at: this https URL.
本文提出了一种名为“3D分组”的新任务。假设存在一个包含来自各种形状的分散部分混合集。这项任务要求算法在所有部分中找到所有可能的组合。为解决这个挑战,我们提出了一个专为3D分组任务设计的称为“基于梯度场自回归采样”的框架(G-FARS)。在我们的框架中,我们设计了一个基于梯度场选择的图神经网络(GNN)来学习基于选择的分子的对数条件概率密度梯度。这种创新方法通过基于梯度场选择的GNN实现了对输入中所有部分的复杂关系的有效捕捉。在训练过程完成时,我们的框架能够通过迭代选择它们来自混合集来自动分组3D部分,利用训练后的基于梯度场选择的GNN获得的知识。我们的代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2405.06828
The circular economy paradigm is gaining interest as a solution to reduce both material supply uncertainties and waste generation. One of the main challenges is monitoring materials, since in general, something that is not measured cannot be effectively managed. In this paper, we propose real-time synchronized object detection to enable, at the same time, autonomous sorting, mapping, and quantification of end-of-life medical materials. Dataset, code, and demo videos are publicly available.
循环经济范式作为一种解决减少材料供应不确定性和废物产生的方法而受到关注。一个主要挑战是监测材料,因为通常无法测量的东西,就无法有效地管理。在本文中,我们提出了实时同步物体检测,以实现同时进行自主分类、映射和量化报废医疗材料。数据集、代码和演示视频都可以公开使用。
https://arxiv.org/abs/2405.06821
Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.
精确和有效的3D物体检测对于确保自动驾驶车辆的安全至关重要。最近,最先进的两阶段3D物体检测器表现出有希望的表演。然而,这些方法在个体上优化建议,而忽略了物体关系之间的丰富上下文信息。在这项研究中,我们引入了一个物体关系模块,包括一个图生成器和图神经网络(GNN),从某些模式中学习空间信息,以提高3D物体检测。具体来说,我们在帧中基于建议创建一个互物体关系图,将每个建议与其邻居建议连接起来。然后,GNN模块从生成的图中提取边缘特征,并通过捕获到的边缘特征迭代优化建议特征。最后,我们将优化的特征作为输入提交给检测头以获得检测结果。与基线PV-RCNN在KITTI验证集上的汽车类别的结果相比,我们的方法在简单、中度和困难级别上分别提高了0.82%、0.74%和0.58%。此外,在测试服务器上,我们的方法在中度和困难级别上比基线高1%以上的表现。
https://arxiv.org/abs/2405.06782
The main barrier to achieving fully autonomous flights lies in autonomous aircraft navigation. Managing non-cooperative traffic presents the most important challenge in this problem. The most efficient strategy for handling non-cooperative traffic is based on monocular video processing through deep learning models. This study contributes to the vision-based deep learning aircraft detection and tracking literature by investigating the impact of data corruption arising from environmental and hardware conditions on the effectiveness of these methods. More specifically, we designed $7$ types of common corruptions for camera inputs taking into account real-world flight conditions. By applying these corruptions to the Airborne Object Tracking (AOT) dataset we constructed the first robustness benchmark dataset named AOT-C for air-to-air aerial object detection. The corruptions included in this dataset cover a wide range of challenging conditions such as adverse weather and sensor noise. The second main contribution of this letter is to present an extensive experimental evaluation involving $8$ diverse object detectors to explore the degradation in the performance under escalating levels of corruptions (domain shifts). Based on the evaluation results, the key observations that emerge are the following: 1) One-stage detectors of the YOLO family demonstrate better robustness, 2) Transformer-based and multi-stage detectors like Faster R-CNN are extremely vulnerable to corruptions, 3) Robustness against corruptions is related to the generalization ability of models. The third main contribution is to present that finetuning on our augmented synthetic data results in improvements in the generalisation ability of the object detector in real-world flight experiments.
实现完全自动驾驶飞行的主要障碍是自主飞行器的导航。管理非合作交通是这个问题中最重要的挑战。最有效的处理非合作交通的方法基于深度学习模型通过单目视频处理。本研究通过研究数据 corruption 对这些方法的有效性产生的影响,对基于视觉的深度学习飞机检测和跟踪文献做出了贡献。具体来说,我们设计了几种常见的数据污染情况,考虑了现实飞行条件下的相机输入。将这些污染情况应用于空中物体跟踪(AOT)数据集,我们构建了名为AOT-C的第一个对空对空航空物体检测的稳健基准数据集。这个数据集中的污染情况涵盖了恶劣天气和传感器噪音等广泛挑战。本论文的第二个主要贡献是,通过包括8个不同类型的物体检测器进行广泛的实验评估,探索了随着数据污染水平不断提高,性能的降解情况。根据评估结果,出现的关键观察结果是:1)YOLO家族的一阶检测器表现出更好的鲁棒性;2)类似于Faster R-CNN的Transformer-based和多阶段检测器对数据污染极其敏感;3)对数据污染的鲁棒性与模型的泛化能力相关。第三主要贡献是,说明了在我们的增强合成数据上进行微调导致物体检测器在现实飞行实验中的泛化能力得到提高。
https://arxiv.org/abs/2405.06765
In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.
在过去的二十年中,由于无人机在军事和民用领域的广泛应用,无人驾驶航空器(UAVs)吸引了越来越多的关注。有效地检测非合作性无人机和准确估算碰撞是实现完全自动驾驶飞机和促进先进空运 mobility(AAM)的关键。本文介绍了一种利用光学传感器检测、跟踪和估算非合作性无人机的深度学习框架。在实施这一全面感测框架时,深度信息对于使自动驾驶无人机感知和围绕障碍物行驶至关重要。 在这项工作中,我们提出了一种仅使用单目摄像机输入来实时估算被检测空中物体距离的方法。为了为我们的深度学习组件训练物体检测、跟踪和深度估计任务,我们利用了亚马逊空中物体跟踪(AOT)数据集。与之前将深度估计模块集成到物体检测器中的方法不同,我们的方法将问题表述为图像到图像的映射。我们采用了一个轻量级的编码器-解码器网络来进行高效且鲁棒的深度估计。 总之,物体检测模块确定并定位障碍物,将此信息传递给跟踪模块用于监控障碍物移动和深度估计模块用于计算距离。我们的方法在空中物体跟踪(AOT)数据集上进行了评估,这是我们已知的大型(至多最好)空对空飞行器数据集。
https://arxiv.org/abs/2405.06749
This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. This approach is grounded in multimodal trajectory prediction, using a deep neural network with LSTM-based social pooling to predict the trajectories of surrounding vehicles. These trajectories are used to compute forward-looking risk assessments along the ego vehicle's path, guiding its navigation. Our method aims to replicate human driving behaviors by learning parameters that emulate human decision-making during driving. We ensure that our model exhibits robust generalization capabilities by conducting simulations, employing real-world driving data to validate the accuracy of our approach in modeling human behavior. The results reveal that our model effectively captures human behavior, showcasing its versatility in modeling human drivers in diverse highway scenarios.
本文提出了一种用于评估自动驾驶系统在仿真环境中的新颖方法,该方法基于分层前向展望、风险意识估计框架和学习参数来生成类似于人类的驾驶轨迹,并适应由模型参数确定的多个驾驶员级别。该方法基于多模态轨迹预测,使用基于LSTM的社交池化深度神经网络预测周围车辆的轨迹。这些轨迹用于计算自车路径上的向前展望风险评估,指导其导航。我们的目标是通过学习模仿人类在驾驶过程中决策的参数,复制人类驾驶行为。我们通过仿真和利用现实世界的驾驶数据来验证我们方法的准确性,确保我们的模型具有稳健的泛化能力。结果表明,我们的模型有效地捕捉了人类行为,展示了在多样高速公路场景中建模人类驾驶员的多样性。
https://arxiv.org/abs/2405.06578
The technology of autonomous driving is currently attracting a great deal of interest in both research and industry. In this paper, we present a deep learning dual-model solution that uses two deep neural networks for combined braking and steering in autonomous vehicles. Steering control is achieved by applying the NVIDIA's PilotNet model to predict the steering wheel angle, while braking control relies on the use of MobileNet SSD. Both models rely on a single front-facing camera for image input. The MobileNet SSD model is suitable for devices with constrained resources, whereas PilotNet struggles to operate efficiently on smaller devices with limited resources. To make it suitable for such devices, we modified the PilotNet model using our own original network design and reduced the number of model parameters and its memory footprint by approximately 60%. The inference latency has also been reduced, making the model more suitable to operate on resource-constrained devices. The modified PilotNet model achieves similar loss and accuracy compared to the original PilotNet model. When evaluated in a simulated environment, both autonomous driving systems, one using the modified PilotNet model and the other using the original PilotNet model for steering, show similar levels of autonomous driving performance.
目前,自动驾驶技术在研究和工业领域都吸引了大量关注。在本文中,我们提出了一个使用两个深度神经网络的联合制动和转向解决方案,用于实现自动驾驶车辆的联合制动。转向控制是通过将NVIDIA的PilotNet模型应用于预测方向盘角度来实现的,而制动控制则依赖于使用MobileNet SSD。两个模型都依赖于单个前向摄像头进行图像输入。移动Net SSD模型适用于具有有限资源的设备,而PilotNet在资源受限的设备上表现不佳。为了使该设备适合此类设备,我们使用我们自己的原始网络设计对PilotNet模型进行了修改,并将其参数和内存足迹减少了约60%。推理延迟也减少了,使模型更适合于资源受限的设备。经过修改的PilotNet模型与原始PilotNet模型在损失和准确度上实现了相似的结果。当在模拟环境中评估时,使用修改的PilotNet模型的自动驾驶系统和使用原始PilotNet模型的自动驾驶系统都表现出相似的自驾性能水平。
https://arxiv.org/abs/2405.06473
Federated learning (FL) offers a privacy-centric distributed learning framework, enabling model training on individual clients and central aggregation without necessitating data exchange. Nonetheless, FL implementations often suffer from non-i.i.d. and long-tailed class distributions across mobile applications, e.g., autonomous vehicles, which leads models to overfitting as local training may converge to sub-optimal. In our study, we explore the impact of data heterogeneity on model bias and introduce an innovative personalized FL framework, Multi-level Personalized Federated Learning (MuPFL), which leverages the hierarchical architecture of FL to fully harness computational resources at various levels. This framework integrates three pivotal modules: Biased Activation Value Dropout (BAVD) to mitigate overfitting and accelerate training; Adaptive Cluster-based Model Update (ACMU) to refine local models ensuring coherent global aggregation; and Prior Knowledge-assisted Classifier Fine-tuning (PKCF) to bolster classification and personalize models in accord with skewed local data with shared knowledge. Extensive experiments on diverse real-world datasets for image classification and semantic segmentation validate that MuPFL consistently outperforms state-of-the-art baselines, even under extreme non-i.i.d. and long-tail conditions, which enhances accuracy by as much as 7.39% and accelerates training by up to 80% at most, marking significant advancements in both efficiency and effectiveness.
翻译:联邦学习(FL)提供了一个以隐私为中心的分布式学习框架,可以实现在不进行数据交换的情况下对单个客户端进行模型训练,并实现集中聚合。然而,FL 的实现通常在移动应用程序中存在非均匀且长尾的类分布,例如自动驾驶车辆,这导致模型在局部训练可能收敛到次优的情况下过拟合。在我们的研究中,我们探讨了数据异质性对模型偏差的影响,并引入了一种创新的多级个性化 FL 框架,称为 Multi-level Personalized Federated Learning(MuPFL),该框架利用了 FL 的分层架构,充分利用各种层面的计算资源。该框架包括三个关键模块:有偏激活值下落(BAVD)以减轻过拟合并加速训练;自适应聚类为基础的模型更新(ACMU)以确保局部模型的全局一致性;以及基于共享知识的类器微调(PKCF),以加强分类并个性化与分不平衡局部数据相关的模型。在不同的现实世界数据集(如图像分类和语义分割)上进行广泛的实验证明,MuPFL 始终优于最先进的基准模型,即使在极端不均匀和长尾条件下,MuPFL 的准确率也提高了 7.39%,训练速度也提高了 80%。这表明,在效率和效果方面,FL 都取得了显著的进步。
https://arxiv.org/abs/2405.06413
Techniques to autonomously drive research have been prominent in Computational Scientific Discovery, while Synthetic Biology is a field of science that focuses on designing and constructing new biological systems for useful purposes. Here we seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. Comprehensive databases of metabolic processes called genome-scale metabolic network models (GEMs) are often used to evaluate cellular engineering strategies to optimise target compound production. However, predicted host behaviours are not always correctly described by GEMs, often due to errors in the models. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To address these, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging boolean matrices to evaluate large logic programs. We introduce a new system, $BMLP_{active}$, which efficiently explores the genomic hypothesis space by guiding informative experimentation through active learning. In contrast to sub-symbolic methods, $BMLP_{active}$ encodes a state-of-the-art GEM of a widely accepted bacterial host in an interpretable and logical representation using datalog logic programs. Notably, $BMLP_{active}$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. $BMLP_{active}$ enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for microbial engineering.
翻译:在计算科学发现中,自动驱动研究的技术突出显示,而合成生物学是一个关注设计和管理有用生物系统的科学领域。在这里,我们试图将基于逻辑的机器学习技术应用于细胞工程,以推动生物学发现。综合性的代谢过程数据库通常被称为基因组代谢网络模型(GEMs)来评估细胞工程策略以优化目标化合物生产。然而,GEMs预测的宿主行为并不总是正确描述,通常由于模型错误。学习GEMs内部复杂遗传相互作用的任务具有计算和经验上的挑战。为了应对这些挑战,我们描述了一种名为布尔矩阵逻辑编程(BMLP)的新方法,通过利用布尔矩阵对大型逻辑程序进行评估。我们引入了一个新的系统$BMLP_{active}$,通过指导通过主动学习进行有意义的实验,以高效探索基因组假设空间。与子符号方法不同,$BMLP_{active}$将广泛接受的细菌宿主中当前最先进的GEM用可解释和逻辑表示进行编码。值得注意的是,$BMLP_{active}$可以成功地学习基因对之间的相互作用,而训练实例比随机实验更少的训练例子,从而克服了实验设计空间增加的增加。$BMLP_{active}$使代谢模型的快速优化成为可靠地工程产生有用化合物的现实方法。它为创建自驱动的微生物工程实验室提供了一种实用的方法。
https://arxiv.org/abs/2405.06724