In recent years, Artificial Neural Networks (ANN) have become a standard in robotic control. However, a significant drawback of large-scale ANNs is their increased power consumption. This becomes a critical concern when designing autonomous aerial vehicles, given the stringent constraints on power and weight. Especially in the case of blimps, known for their extended endurance, power-efficient control methods are essential. Spiking neural networks (SNN) can provide a solution, facilitating energy-efficient and asynchronous event-driven processing. In this paper, we have evolved SNNs for accurate altitude control of a non-neutrally buoyant indoor blimp, relying solely on onboard sensing and processing power. The blimp's altitude tracking performance significantly improved compared to prior research, showing reduced oscillations and a minimal steady-state error. The parameters of the SNNs were optimized via an evolutionary algorithm, using a Proportional-Derivative-Integral (PID) controller as the target signal. We developed two complementary SNN controllers while examining various hidden layer structures. The first controller responds swiftly to control errors, mitigating overshooting and oscillations, while the second minimizes steady-state errors due to non-neutral buoyancy-induced drift. Despite the blimp's drivetrain limitations, our SNN controllers ensured stable altitude control, employing only 160 spiking neurons.
近年来,人工神经网络(ANN)已成为机器人控制的标准。然而,大规模ANN的一个严重缺点是其增加的功耗。考虑到功率和重量的严格限制,在设计自主飞行飞行器时,这个问题变得至关重要。特别是考虑到风筝这种以其持久的耐力著称的飞行器,高效的控制方法是至关重要的。脉冲神经网络(SNN)可以提供解决方案,以促进高效的、异步的事件驱动处理。在本文中,我们演化了SNNs,以精确控制一个非中性浮力室内风筝的海拔,仅依靠体内的感知和处理能力。风筝的海拔跟踪性能相比先前的研究显著提高,减少了振荡,并最小化了稳定的误差。SNNs的参数通过进化算法进行了优化,使用比例-积分(PID)控制器作为目标信号。在检查各种隐藏层结构的同时,我们开发了两个互补的SNN控制器。第一个控制器迅速响应控制错误,减轻过度延伸和振荡,而第二个控制器由于非中性浮力引起的漂移最小化了稳定的误差。尽管风筝的动力系统限制,我们的SNN控制器确保了稳定的海拔控制,仅使用了160个脉冲神经元。
https://arxiv.org/abs/2309.12937
In this work, we introduce OmniDrones, an efficient and flexible platform tailored for reinforcement learning in drone control, built on Nvidia's Omniverse Isaac Sim. It employs a bottom-up design approach that allows users to easily design and experiment with various application scenarios on top of GPU-parallelized simulations. It also offers a range of benchmark tasks, presenting challenges ranging from single-drone hovering to over-actuated system tracking. In summary, we propose an open-sourced drone simulation platform, equipped with an extensive suite of tools for drone learning. It includes 4 drone models, 5 sensor modalities, 4 control modes, over 10 benchmark tasks, and a selection of widely used RL baselines. To showcase the capabilities of OmniDrones and to support future research, we also provide preliminary results on these benchmark tasks. We hope this platform will encourage further studies on applying RL to practical drone systems.
在本研究中,我们介绍了 OmniDrones,一个高效且灵活的平台,专门设计用于无人机控制中的强化学习,基于Nvidia的 OmniverseIsaac Sim建立。它采用bottom-up设计方法,使用户可以在GPU并行模拟的基础上轻松设计和实验各种应用场景。它还提供了一系列基准任务,从单个无人机悬停到失控系统跟踪等,面临各种挑战。因此,我们提出了一个开源的无人机模拟平台,配备无人机学习广泛的工具集。它包括4个无人机模型、5个传感器模式、4个控制模式、超过10个基准任务和常用的RL基线选择。为了展示 Omnidrones 的能力并支持未来的研究,我们还提供了这些基准任务的前驱结果。我们希望这个平台将鼓励进一步研究将RL应用于实际无人机系统。
https://arxiv.org/abs/2309.12825
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped disturbances. On this basis, a robust controller with prescribed performance is proposed using a backstepping technique, which improves the transient performance and guarantees fast convergence. Simulation outcomes have been provided to illustrate the effectiveness of the proposed control scheme.
自主拖拽机器人收集机器人的路径跟踪控制是一项具有挑战性的项目,因为环境非常复杂、噪声非常严重以及外部干扰。该项目研究了受严重环境影响的ATCR控制方案。基于运动学模型的自适应滑动模式干扰观察器被提出,以估计积聚的干扰。基于这种方法,提出了一种具有规定性能的鲁棒控制器,使用回退技术,可以提高暂态性能并保证快速收敛。模拟结果提供了以说明所提出的控制方案有效性的示例。
https://arxiv.org/abs/2309.12660
Existing research has shown the potential of classifying Alzheimers Disease (AD) from eye-tracking (ET) data with classifiers that rely on task-specific engineered features. In this paper, we investigate whether we can improve on existing results by using a Deep-Learning classifier trained end-to-end on raw ET data. This classifier (VTNet) uses a GRU and a CNN in parallel to leverage both visual (V) and temporal (T) representations of ET data and was previously used to detect user confusion while processing visual displays. A main challenge in applying VTNet to our target AD classification task is that the available ET data sequences are much longer than those used in the previous confusion detection task, pushing the limits of what is manageable by LSTM-based models. We discuss how we address this challenge and show that VTNet outperforms the state-of-the-art approaches in AD classification, providing encouraging evidence on the generality of this model to make predictions from ET data.
现有的研究已经表明,可以从眼睛跟踪数据(ET)数据中分类阿尔茨海默病(AD),而该分类方法依赖于特定的工程特征任务。在本文中,我们探讨了是否可以使用一种深度学习分类器,该分类器是基于 raw ET数据的端到端训练。该分类器(VTNet)同时使用GRU和CNN来利用ET数据的可视化(V)和时间表示(T)。它以前用于处理视觉显示时检测用户混淆。在应用VTNet到我们的目标AD分类任务时,主要挑战是可用的ET数据序列比先前混淆检测任务中使用的序列更长,这使得LSTM模型难以处理。我们讨论了如何克服这个挑战,并表明VTNet在AD分类方面优于当前最佳方法,提供了从ET数据进行预测的令人鼓舞的证据。
https://arxiv.org/abs/2309.12574
In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment.
在人类机器人的合作中,合作机器人的速度与人类工人的安全之间存在一种权衡关系。在我们之前的文章中,我们介绍了一种时间最优路径跟踪算法,旨在最大化速度,同时确保人类工人的安全。该算法实时运行,并按照ISO标准为每个周期提供安全和最快的控制输入。然而,由于保守模型简化的不准确计算,并未实现真正的最优性。要实现真正的最优性,我们需要一种方法,它可以在多个机器人配置下计算距离,并在实时在线机器人控制中计算距离,以及尽可能准确地进行最优控制。在本文中,我们提出了一种基于预计算链接local SDF的批量快速精确距离检查方法。我们的方法使用GPU在运行时计算轨迹上的500个 Waypoints 的距离,小于1秒钟,使其适用于时间紧张的机器人控制。此外,我们提出了一种神经网络近似,以加速预处理的2倍速度。最后,我们实验证实,我们的方法可以在动态和协作环境中更快地导航6自由度机器人,比基于几何基本点的距离检查方法更早。
https://arxiv.org/abs/2309.12543
Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame. Extensive experiments demonstrate that compared with the previous SOTA models, our PSCFormer network exhibits a great advantage in terms of segmentation results under the panoramic setting. Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.
全景视频包含了更丰富的空间信息,因此在一些领域如自动驾驶和虚拟现实中吸引了大量的关注,例如。然而,现有的视频分割数据集只关注传统的平面图像。为了应对这个问题,在本文中,我们提出了一个全景视频数据集,PanoVOS。该数据集提供了150个高分辨率的视频和多种运动。为了量化2D平面视频和全景视频之间的领域差异,我们评估了15个现有视频物体分割模型(VOS)在PanoVOS上的表现。通过错误分析,我们发现它们都无法解决全景视频的像素级内容中断问题。因此,我们提出了全景空间一致性Transformer(PSC former),它能够有效利用上一句的语义边界信息,对当前帧进行像素级匹配。广泛的实验结果表明,与以前的SOTA模型相比,我们的PSC former网络在全景设置下的视频分割结果表现优异。我们的数据集在全景VOS方面提出了新的挑战,我们希望能够推动全景分割/跟踪的发展。
https://arxiv.org/abs/2309.12303
The field of visual object tracking is dominated by methods that combine simple tracking algorithms and ad hoc schemes. Probabilistic tracking algorithms, which are leading in other fields, are surprisingly absent from the leaderboards. We found that accounting for distance in target kinematics, exploiting detector confidence and modelling non-uniform clutter characteristics is critical for a probabilistic tracker to work in visual tracking. Previous probabilistic methods fail to address most or all these aspects, which we believe is why they fall so far behind current state-of-the-art (SOTA) methods (there are no probabilistic trackers in the MOT17 top 100). To rekindle progress among probabilistic approaches, we propose a set of pragmatic models addressing these challenges, and demonstrate how they can be incorporated into a probabilistic framework. We present BASE (Bayesian Approximation Single-hypothesis Estimator), a simple, performant and easily extendible visual tracker, achieving state-of-the-art (SOTA) on MOT17 and MOT20, without using Re-Id. Code will be made available at this https URL
视觉对象跟踪领域主要由结合简单跟踪算法和特定计划的方法所占据。在其他领域的领先者——Probabilistic跟踪算法——却在排行榜上出乎意料地消失了。我们发现,在目标运动学中考虑距离、利用探测器信心、建模非均匀杂波特征对于Probabilistic跟踪算法在视觉跟踪中工作至关重要。以前的Probabilistic方法未能解决大部分或这些问题,我们认为这就是为什么它们落后于当前最先进的方法(在MOT17 top100中没有Probabilistic跟踪器)的原因。为了重新点燃Probabilistic方法之间的进展,我们提出了一组实用模型解决这些问题,并演示了如何将它们纳入Probabilistic框架中。我们介绍了BASE(Bayesian近似单假设估计器),这是一个简单、高效且易于扩展的视觉跟踪器,在MOT17和MOT20上实现了最先进的方法(不使用Re-Id)。代码将在这个https URL上提供。
https://arxiv.org/abs/2309.12035
Numerous datasets and benchmarks exist to assess and compare Simultaneous Localization and Mapping (SLAM) algorithms. Nevertheless, their precision must follow the rate at which SLAM algorithms improved in recent years. Moreover, current datasets fall short of comprehensive data-collection protocol for reproducibility and the evaluation of the precision or accuracy of the recorded trajectories. With this objective in mind, we proposed the Robotic Total Stations Ground Truthing dataset (RTS-GT) dataset to support localization research with the generation of six-Degrees Of Freedom (DOF) ground truth trajectories. This novel dataset includes six-DOF ground truth trajectories generated using a system of three Robotic Total Stations (RTSs) tracking moving robotic platforms. Furthermore, we compare the performance of the RTS-based system to a Global Navigation Satellite System (GNSS)-based setup. The dataset comprises around sixty experiments conducted in various conditions over a period of 17 months, and encompasses over 49 kilometers of trajectories, making it the most extensive dataset of RTS-based measurements to date. Additionally, we provide the precision of all poses for each experiment, a feature not found in the current state-of-the-art datasets. Our results demonstrate that RTSs provide measurements that are 22 times more stable than GNSS in various environmental settings, making them a valuable resource for SLAM benchmark development.
有许多数据和基准存在,以评估和比较Simultaneous Localization and Mapping(SLAM)算法。然而,它们的精度必须跟上近年来SLAM算法改进的速度。此外,当前的数据集缺乏完整的数据收集协议,以重复测量和评估记录的轨迹的精度或准确性。考虑到这一目标,我们提出了机器人总站地面 truthing 数据集(RTS-GT)数据集,以支持以生成六自由度(DOF)地面 truth轨迹为目标的Localization研究。这个新的数据集包括使用三个机器人总站跟踪移动机器人平台的六自由度地面 truth轨迹。我们还比较了基于RTS的系统与全球导航卫星系统(GNSS)系统的setup的性能。数据集包括在多种条件下进行的实验,持续了17个月,覆盖了超过49公里的轨迹,是迄今为止最广泛的基于RTS的数据集。此外,我们提供了每个实验的所有姿态的精度,这是当前先进数据集中所没有的特征。我们的结果表明,RTS提供的稳定性比GNSS在多种环境设置中高出22倍,因此它们是SLAM基准开发的宝贵资源。
https://arxiv.org/abs/2309.11935
Nonlinear tracking control enabling a dynamical system to track a desired trajectory is fundamental to robotics, serving a wide range of civil and defense applications. In control engineering, designing tracking control requires complete knowledge of the system model and equations. We develop a model-free, machine-learning framework to control a two-arm robotic manipulator using only partially observed states, where the controller is realized by reservoir computing. Stochastic input is exploited for training, which consists of the observed partial state vector as the first and its immediate future as the second component so that the neural machine regards the latter as the future state of the former. In the testing (deployment) phase, the immediate-future component is replaced by the desired observational vector from the reference trajectory. We demonstrate the effectiveness of the control framework using a variety of periodic and chaotic signals, and establish its robustness against measurement noise, disturbances, and uncertainties.
非线性跟踪控制使动态系统能够跟踪所需的轨迹是机器人学的基础,适用于广泛的民事和国防应用。在控制工程中,设计跟踪控制需要对系统模型和方程全面了解。我们开发了一个无模型机器学习框架,以控制一个仅部分观测的二自由度机器人手,控制器由库存计算实现。随机输入被利用进行训练,其组成为观测的部分状态向量作为第一个组成部分,其未来的部分状态向量作为第二个组成部分,以便神经网络将其视为前者的将来状态。在测试(部署)阶段,未来的部分被替换为从参考轨迹中 desired 的观测向量。我们使用各种定期和混沌信号来证明控制框架的有效性,并确定其对测量噪声、干扰和不确定性的鲁棒性。
https://arxiv.org/abs/2309.11470
Camera localization in 3D LiDAR maps has gained increasing attention due to its promising ability to handle complex scenarios, surpassing the limitations of visual-only localization methods. However, existing methods mostly focus on addressing the cross-modal gaps, estimating camera poses frame by frame without considering the relationship between adjacent frames, which makes the pose tracking unstable. To alleviate this, we propose to couple the 2D-3D correspondences between adjacent frames using the 2D-2D feature matching, establishing the multi-view geometrical constraints for simultaneously estimating multiple camera poses. Specifically, we propose a new 2D-3D pose tracking framework, which consists: a front-end hybrid flow estimation network for consecutive frames and a back-end pose optimization module. We further design a cross-modal consistency-based loss to incorporate the multi-view constraints during the training and inference process. We evaluate our proposed framework on the KITTI and Argoverse datasets. Experimental results demonstrate its superior performance compared to existing frame-by-frame 2D-3D pose tracking methods and state-of-the-art vision-only pose tracking algorithms. More online pose tracking videos are available at \url{this https URL}
相机位置在3D激光地图中因其处理复杂场景的潜力而日益受到关注,超越了仅使用视觉定位方法的限制。然而,现有的方法主要关注解决跨模态间隙,Frame by Frame 估算相机姿态,而不考虑相邻帧之间的关系,这导致姿态跟踪不稳定。为了减轻这种情况,我们提出了一种新方法,它使用2D-2D特征匹配将相邻帧之间的2D-3D对应关系耦合起来,建立同时估计多个相机姿态的多视角几何约束。具体来说,我们提出了一种新的2D-3D姿态跟踪框架,它包括连续帧的前后端混合流估计网络和后端姿态优化模块。我们还设计了一种新的跨模态一致性损失,在训练和推断过程中纳入多视角约束。我们使用KITTI和Argoverse数据集评估了我们提出的框架。实验结果表明,与现有的帧间2D-3D姿态跟踪方法和最先进的仅使用视觉姿态跟踪算法相比,我们的框架表现更好。更多在线姿态跟踪视频可以在 \url{this https URL} 找到。
https://arxiv.org/abs/2309.11335
Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this paper, we propose a novel learning-based method that exploits the tightly packed three-dimensional cell structure of plant cells to create a three-dimensional graph in order to perform accurate cell tracking. We further propose novel algorithms for cell division detection and effective three-dimensional registration, which improve upon the state-of-the-art algorithms. We demonstrate the efficacy of our algorithm in terms of tracking accuracy and inference-time on a benchmark dataset.
显微镜下跟踪植物细胞是一项挑战性的问题,由于许多细胞、紧密排列的植物细胞不同层的生长不规则以及细胞分裂。此外,细胞在组织深处发出的噪声以及 imaging 过程中不可避免的系统误差进一步复杂了问题。在本文中,我们提出了一种基于新学习的方法,利用植物细胞的紧凑三维细胞结构创建三维图来进行准确的细胞跟踪。我们还提出了用于细胞分裂检测和有效三维对齐的新算法,改进了现有算法。我们在一个基准数据集上展示了我们算法的跟踪准确性和推断时间的有效性。
https://arxiv.org/abs/2309.11157
Wheeled mobile robots need the ability to estimate their motion and the effect of their control actions for navigation planning. In this paper, we present ST-VIO, a novel approach which tightly fuses a single-track dynamics model for wheeled ground vehicles with visual inertial odometry. Our method calibrates and adapts the dynamics model online and facilitates accurate forward prediction conditioned on future control inputs. The single-track dynamics model approximates wheeled vehicle motion under specific control inputs on flat ground using ordinary differential equations. We use a singularity-free and differentiable variant of the single-track model to enable seamless integration as dynamics factor into VIO and to optimize the model parameters online together with the VIO state variables. We validate our method with real-world data in both indoor and outdoor environments with different terrain types and wheels. In our experiments, we demonstrate that our ST-VIO can not only adapt to the change of the environments and achieve accurate prediction under new control inputs, but even improves the tracking accuracy.
无人驾驶汽车需要能够估计它们的运动以及控制动作对其导航规划的影响。在本文中,我们介绍了ST-VIO,一种新的方法来紧密融合为无人驾驶地面车辆的单轨道动态模型,同时结合视觉惯性测量。我们的算法在线校准和适应动态模型,并方便地基于未来的控制输入进行准确的前预测。单轨道动态模型在平坦地面上使用普通微分方程近似于特定控制输入下的无人驾驶车辆运动。我们使用无零点和可区分的不同单轨道模型,以实现无缝集成作为动态因素,并将其与VIO状态变量一起优化模型参数。我们在室内和室外不同地形和车轮的不同环境下,使用真实世界数据验证我们的算法。在我们的实验中,我们证明我们的ST-VIO不仅可以适应环境变化,并在新的控制输入下实现准确的预测,甚至改善了跟踪精度。
https://arxiv.org/abs/2309.11148
In recent years, advanced model-based and data-driven control methods are unlocking the potential of complex robotics systems, and we can expect this trend to continue at an exponential rate in the near future. However, ensuring safety with these advanced control methods remains a challenge. A well-known tool to make controllers (either Model Predictive Controllers or Reinforcement Learning policies) safe, is the so-called control-invariant set (a.k.a. safe set). Unfortunately, for nonlinear systems, such a set cannot be exactly computed in general. Numerical algorithms exist for computing approximate control-invariant sets, but classic theoretic control methods break down if the set is not exact. This paper presents our recent efforts to address this issue. We present a novel Model Predictive Control scheme that can guarantee recursive feasibility and/or safety under weaker assumptions than classic methods. In particular, recursive feasibility is guaranteed by making the safe-set constraint move backward over the horizon, and assuming that such set satisfies a condition that is weaker than control invariance. Safety is instead guaranteed under an even weaker assumption on the safe set, triggering a safe task-abortion strategy whenever a risk of constraint violation is detected. We evaluated our approach on a simulated robot manipulator, empirically demonstrating that it leads to less constraint violations than state-of-the-art approaches, while retaining reasonable performance in terms of tracking cost and number of completed tasks.
近年来,基于模型和数据驱动的控制方法已经解锁了复杂机器人系统的的潜力,并且我们预计在未来将这种趋势以指数速度继续发展。然而,确保这些高级控制方法的安全性仍然是一个挑战。一种著名的工具,用于确保控制器(无论是模型预测控制器还是强化学习策略)的安全,就是所谓的控制不变的集合(也称为安全集)。不幸的是,对于非线性系统,这种集合 general 无法精确计算。数值算法存在计算近似控制不变的集合的方法,但经典的理论控制方法如果集合不是精确计算的就会崩溃。本文介绍了我们最近为解决该问题所做的努力。我们提出了一种独特的模型预测控制方案,能够比经典方法更弱地假设控制不变的条件。特别是,通过让安全集的限制向前移动 horizon,并假设这种集合满足比控制不变的条件更弱的条件,就可以保证递归可行性和/或安全性。相反,安全在安全集上的假设更加弱,当检测到约束违反风险时,就会触发安全的任务流产策略。我们对模拟机器人操纵器进行了评估, empirical 表明,它比最先进的方法导致的约束违反风险更少,但在跟踪成本和完成任务数量方面仍表现出合理的性能。
https://arxiv.org/abs/2309.11124
In this paper, we present a simultaneous exploration and object search framework for the application of autonomous trolley collection. For environment representation, a task-oriented environment partitioning algorithm is presented to extract diverse information for each sub-task. First, LiDAR data is classified as potential objects, walls, and obstacles after outlier removal. Segmented point clouds are then transformed into a hybrid map with the following functional components: object proposals to avoid missing trolleys during exploration; room layouts for semantic space segmentation; and polygonal obstacles containing geometry information for efficient motion planning. For exploration and simultaneous trolley collection, we propose an efficient exploration-based object search method. First, a traveling salesman problem with precedence constraints (TSP-PC) is formulated by grouping frontiers and object proposals. The next target is selected by prioritizing object search while avoiding excessive robot backtracking. Then, feasible trajectories with adequate obstacle clearance are generated by topological graph search. We validate the proposed framework through simulations and demonstrate the system with real-world autonomous trolley collection tasks.
在本文中,我们提出了一种同时探索和对象搜索的框架,用于应用自主 trolley 收集。为了环境表示,我们提出了一种任务导向的环境分割算法,以提取每个子任务多样化的信息。首先,通过去除异常值,LiDAR 数据被分类为潜在对象、墙壁和障碍物。分割点云随后转换为具有以下功能组件的混合地图:对象提议以避免在探索中遗漏 trolleys;语义空间分割房间的布局;以及包含几何信息的多边形障碍物,以进行高效 motion planning。对于探索和同时 trolley 收集,我们提出了一种高效的探索基于对象搜索方法。首先,通过将 Frontier 和对象提议分组,提出了一个具有优先级限制的旅行推销员问题(TSP-PC)。通过优先考虑对象搜索而避免过度机器人回退,选择下一个目标。然后,通过拓扑图搜索生成可行的路径,并获得足够的障碍物清除。通过模拟验证所提出的框架,并使用自主 trolley 收集的实际任务演示了系统。
https://arxiv.org/abs/2309.11107
This paper presents a novel Stochastic Optimal Control (SOC) method based on Model Predictive Path Integral control (MPPI), named Stein Variational Guided MPPI (SVG-MPPI), designed to handle rapidly shifting multimodal optimal action distributions. While MPPI can find a Gaussian-approximated optimal action distribution in closed form, i.e., without iterative solution updates, it struggles with multimodality of the optimal distributions, such as those involving non-convex constraints for obstacle avoidance. This is due to the less representative nature of the Gaussian. To overcome this limitation, our method aims to identify a target mode of the optimal distribution and guide the solution to converge to fit it. In the proposed method, the target mode is roughly estimated using a modified Stein Variational Gradient Descent (SVGD) method and embedded into the MPPI algorithm to find a closed-form "mode-seeking" solution that covers only the target mode, thus preserving the fast convergence property of MPPI. Our simulation and real-world experimental results demonstrate that SVG-MPPI outperforms both the original MPPI and other state-of-the-art sampling-based SOC algorithms in terms of path-tracking and obstacle-avoidance capabilities. Source code: this https URL
本文提出了基于模型预测路径积分控制(MPPI)的新型随机最优控制(SOC)方法,称为 Stein Variational Guided MPPI(SVG-MPPI),旨在快速转换多模式最优行动分布。虽然 MPPI 可以在闭式形式中找到Gaussian近似的最优行动分布,即不需要迭代的解决方案更新,但它与最优分布的多模式性质,例如涉及避免障碍物的非凸约束,产生了困难。这是因为Gaussian 的代表性较弱。为了克服这一限制,我们的方法旨在确定最优分布的目标模式,并 guide 解决方案收敛到适应它。在 proposed 方法中,目标模式是通过修改 Stein Variational 梯度下降(SVGD)方法大致估计的,并嵌入到 MPPI 算法中,以找到覆盖目标模式的闭式“模式搜索”解决方案,从而保留了 MPPI 的快速收敛特性。我们的模拟和现实世界实验结果表明,SVG-MPPI 在路径跟踪和避免障碍物的能力方面优于原始的 MPPI 和其他先进的采样-basedSOC算法。源代码: this https URL
https://arxiv.org/abs/2309.11040
Existing nighttime unmanned aerial vehicle (UAV) trackers follow an "Enhance-then-Track" architecture - first using a light enhancer to brighten the nighttime video, then employing a daytime tracker to locate the object. This separate enhancement and tracking fails to build an end-to-end trainable vision system. To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts. Without a separate enhancer, DCPT directly encodes anti-dark capabilities into prompts using a darkness clue prompter (DCP). Specifically, DCP iteratively learns emphasizing and undermining projections for darkness clues. It then injects these learned visual prompts into a daytime tracker with fixed parameters across transformer layers. Moreover, a gated feature aggregation mechanism enables adaptive fusion between prompts and between prompts and the base model. Extensive experiments show state-of-the-art performance for DCPT on multiple dark scenario benchmarks. The unified end-to-end learning of enhancement and tracking in DCPT enables a more trainable system. The darkness clue prompting efficiently injects anti-dark knowledge without extra modules. Code and models will be released.
现有的夜晚无人飞行器(UAV)跟踪器遵循“增强-然后跟踪”架构 - 首先使用一盏光线增强器来照亮夜晚视频,然后使用白天跟踪器来定位物体。这种分开增强和跟踪的设计无法构建一个完整的可训练的视觉系统。为了解决这一问题,我们提出了一种名为“黑暗线索提示跟踪”(DCPT)的新型架构,它能够在晚上高效地学习生成黑暗线索提示,从而实现UAV的稳健跟踪。在没有单独的增强器的情况下,DCPT使用黑暗线索提示器(DCP)直接编码反黑暗能力 into提示。具体来说,DCP迭代地学习强调和削弱黑暗线索的投影。它然后将这些学到的视觉提示注入到白天跟踪器中,使用跨Transformer层固定的参数。此外,一个门控特征聚合机制使得提示和提示与基模型之间的自适应融合实现。广泛的实验结果表明,DCPT在多个黑暗场景基准测试中表现出最先进的性能。DCPT的统一的增强和跟踪端到端学习实现了更可训练的系统。DCPT的黑暗线索提示器不需要额外的模块即可高效地注入反黑暗知识。代码和模型将发布。
https://arxiv.org/abs/2309.10491
3D scene graphs offer a more efficient representation of the environment by hierarchically organizing diverse semantic entities and the topological relationships among them. Fiducial markers, on the other hand, offer a valuable mechanism for encoding comprehensive information pertaining to environments and the objects within them. In the context of Visual SLAM (VSLAM), especially when the reconstructed maps are enriched with practical semantic information, these markers have the potential to enhance the map by augmenting valuable semantic information and fostering meaningful connections among the semantic objects. In this regard, this paper exploits the potential of fiducial markers to incorporate a VSLAM framework with hierarchical representations that generates optimizable multi-layered vision-based situational graphs. The framework comprises a conventional VSLAM system with low-level feature tracking and mapping capabilities bolstered by the incorporation of a fiducial marker map. The fiducial markers aid in identifying walls and doors in the environment, subsequently establishing meaningful associations with high-level entities, including corridors and rooms. Experimental results are conducted on a real-world dataset collected using various legged robots and benchmarked against a Light Detection And Ranging (LiDAR)-based framework (S-Graphs) as the ground truth. Consequently, our framework not only excels in crafting a richer, multi-layered hierarchical map of the environment but also shows enhancement in robot pose accuracy when contrasted with state-of-the-art methodologies.
3D场景Graph通过Hierarchically organizing diverse semantic entities和它们之间的topological关系,提供了更高效的对环境的表示。标志位图则提供了一个重要的机制,用于编码与环境和其中的对象相关的全面信息。在视觉多时态SLAM(VSLAM)的背景下,特别是当重构的地图中添加实际语义信息时,这些标志位图有潜力通过增加宝贵的语义信息并促进语义对象之间的有意义连接来增强地图。在这方面,本文利用标志位图的潜力,将其纳入一个VSLAM框架,该框架通过Hierarchically representing产生可优化的多层视觉场景 Graph。框架包括一个传统的VSLAM系统,通过添加标志位图增强了低级别特征跟踪和映射能力。标志位图帮助识别环境中的墙壁和门,随后与高级别实体,包括走廊和房间建立有意义的连接。实验结果使用了使用各种腿机器人收集的现实世界数据集,并将其与基于光检测和测量(LiDAR)框架(S-Graphs)作为基准值进行比较。因此,我们的框架不仅 excels 在构建更丰富、多层的Hierarchically organize environmental map方面,而且在与最先进的方法学进行对比时,还表现出机器人姿态准确性的提高。
https://arxiv.org/abs/2309.10461
An accurate and uncertainty-aware 3D human body pose estimation is key to enabling truly safe but efficient human-robot interactions. Current uncertainty-aware methods in 3D human pose estimation are limited to predicting the uncertainty of the body posture, while effectively neglecting the body shape and root pose. In this work, we present GloPro, which to the best of our knowledge the first framework to predict an uncertainty distribution of a 3D body mesh including its shape, pose, and root pose, by efficiently fusing visual clues with a learned motion model. We demonstrate that it vastly outperforms state-of-the-art methods in terms of human trajectory accuracy in a world coordinate system (even in the presence of severe occlusions), yields consistent uncertainty distributions, and can run in real-time. Our code will be released upon acceptance at this https URL.
准确的、意识到不确定性的三维人体姿态估计是实现真正安全但高效的人类机器人交互的关键。当前在三维人体姿态估计中意识到不确定性的方法局限于预测身体姿态的不确定性,而有效地忽略了身体形状和基态。在这项工作中,我们介绍了 GloPro,它是我们所知的第一位框架,通过高效地结合学习的运动模型视觉线索,预测3D身体网格包括其形状、姿态和基态的不确定性分布。我们证明,它在世界坐标系中的人向量精度方面比最先进的方法(即使在严重遮挡的情况下)表现出色,产生一致的不确定分布,并且可以在实时中运行。我们的代码将在接受此httpsURL后发布。
https://arxiv.org/abs/2309.10369
Multiple pedestrian tracking faces the challenge of tracking pedestrians in the presence of occlusion. Existing methods suffer from inaccurate motion estimation, appearance feature extraction, and association due to occlusion, leading to inadequate Identification F1-Score (IDF1), excessive ID switches (IDSw), and insufficient association accuracy and recall (AssA and AssR). We found that the main reason is abnormal detections caused by partial occlusion. In this paper, we suggest that the key insight is explicit motion estimation, reliable appearance features, and fair association in occlusion scenes. Specifically, we propose an adaptive occlusion-aware multiple pedestrian tracker, OccluTrack. We first introduce an abnormal motion suppression mechanism into the Kalman Filter to adaptively detect and suppress outlier motions caused by partial occlusion. Second, we propose a pose-guided re-ID module to extract discriminative part features for partially occluded pedestrians. Last, we design a new occlusion-aware association method towards fair IoU and appearance embedding distance measurement for occluded pedestrians. Extensive evaluation results demonstrate that our OccluTrack outperforms state-of-the-art methods on MOT-Challenge datasets. Particularly, the improvements on IDF1, IDSw, AssA, and AssR demonstrate the effectiveness of our OccluTrack on tracking and association performance.
多个行人跟踪面临在遮挡情况下追踪行人的挑战。现有的方法因为遮挡引起的不准确运动估计、外貌特征提取和关联而表现不佳,导致识别 F1-得分(IDF1)、过多的 ID 切换(IDSw)和不足的关联准确性和召回(AssA 和 AssR)。我们发现主要原因是因为partial occlusion引起的异常检测。在本文中,我们建议的关键 insight 是在遮挡场景下明确的运动估计、可靠的外貌特征提取和公正的关联。具体来说,我们提出了一种自适应的遮挡 aware 多个行人跟踪器,OccluTrack。我们首先引入了异常运动抑制机制,在 Kalman 滤波中自适应地检测和抑制partial occlusion引起的异常运动。其次,我们提出了一个基于姿势指导的重新识别模块,以提取partial occlusion 下具有区分性的部分特征。最后,我们设计了一种新的遮挡 aware 关联方法,以对遮挡行人进行公正的 IoU 和外貌嵌入距离测量。广泛的评估结果表明,我们的OccluTrack 在 MOT-挑战数据集上比现有方法表现更好。特别是,IDF1、IDSw、AssA 和 AssR 的提高证明了我们在跟踪和关联性能方面的有效性。
https://arxiv.org/abs/2309.10360
Artificial Intelligence techniques can be used to classify a patient's physical activities and predict vital signs for remote patient monitoring. Regression analysis based on non-linear models like deep learning models has limited explainability due to its black-box nature. This can require decision-makers to make blind leaps of faith based on non-linear model results, especially in healthcare applications. In non-invasive monitoring, patient data from tracking sensors and their predisposing clinical attributes act as input features for predicting future vital signs. Explaining the contributions of various features to the overall output of the monitoring application is critical for a clinician's decision-making. In this study, an Explainable AI for Quantitative analysis (QXAI) framework is proposed with post-hoc model explainability and intrinsic explainability for regression and classification tasks in a supervised learning approach. This was achieved by utilizing the Shapley values concept and incorporating attention mechanisms in deep learning models. We adopted the artificial neural networks (ANN) and attention-based Bidirectional LSTM (BiLSTM) models for the prediction of heart rate and classification of physical activities based on sensor data. The deep learning models achieved state-of-the-art results in both prediction and classification tasks. Global explanation and local explanation were conducted on input data to understand the feature contribution of various patient data. The proposed QXAI framework was evaluated using PPG-DaLiA data to predict heart rate and mobile health (MHEALTH) data to classify physical activities based on sensor data. Monte Carlo approximation was applied to the framework to overcome the time complexity and high computation power requirements required for Shapley value calculations.
人工智能技术可以用来对患者进行活动分类和远程监测患者的生命体征。基于非线性模型,如深度学习模型,的趋势分析由于它的黑盒性质而限制了解释性。这需要决策者基于非线性模型结果做出盲目的信任决策,特别是在医疗应用中。在非侵入性监测中,从跟踪传感器收集的患者数据及其激发的临床特征 act as 输入特征,以预测未来的生命体征。解释各种特征对监测应用整体输出的贡献对于临床医生的决策至关重要。在本研究中,提出了一种可解释的人工智能(QXAI)框架,以监督学习和自我解释为目标,在Regression和分类任务中实现。这种方法利用了斯皮尔曼价值的概念,并在深度学习模型中引入了注意力机制。我们采用了人工神经网络(ANN)和基于注意力的双向LSTM(BiLSTM)模型,以基于传感器数据预测心率,并分类基于传感器数据的活动。深度学习模型在预测和分类任务中取得了最先进的结果。对于输入数据进行全球解释和本地解释,以理解各种患者数据的特征贡献。该提出的QXAI框架使用PPG-DaLiA数据预测心率,并使用移动卫生(MHEALTH)数据以分类基于传感器数据的活动。蒙特卡罗近似应用于框架,以克服计算斯皮尔曼价值所需的时间复杂性和高计算功率要求。
https://arxiv.org/abs/2309.10293