This paper presents a novel approach to the digital signing of electronic documents through the use of a camera-based interaction system, single-finger tracking for sign recognition, and multi commands executing hand gestures. The proposed solution, referred to as "Air Signature," involves writing the signature in front of the camera, rather than relying on traditional methods such as mouse drawing or physically signing on paper and showing it to a web camera. The goal is to develop a state-of-the-art method for detecting and tracking gestures and objects in real-time. The proposed methods include applying existing gesture recognition and object tracking systems, improving accuracy through smoothing and line drawing, and maintaining continuity during fast finger movements. An evaluation of the fingertip detection, sketching, and overall signing process is performed to assess the effectiveness of the proposed solution. The secondary objective of this research is to develop a model that can effectively recognize the unique signature of a user. This type of signature can be verified by neural cores that analyze the movement, speed, and stroke pixels of the signing in real time. The neural cores use machine learning algorithms to match air signatures to the individual's stored signatures, providing a secure and efficient method of verification. Our proposed System does not require sensors or any hardware other than the camera.
本文提出了一种新颖的方法,通过使用摄像头交互系统、单手指跟踪签名和多命令手势来对电子文档进行数字签名。所提出的解决方案被称为“空气签名”,涉及在摄像头前签名,而不是依赖传统的如鼠标绘图或纸质签名并将其显示给网络摄像头的方法。目标是开发出一种在实时检测和跟踪手势和物体的高级方法。所提出的方法包括应用现有的手势识别和物体跟踪系统、通过平滑和绘线来提高准确性以及保持连续性在快速手指运动过程中。为了评估所提出的解决方案的有效性,对指尖检测、绘图和整体签名过程进行了评估。本研究的主要目标是开发一个模型,可以有效地识别用户的独特签名。这种签名可以通过分析签署者的运动、速度和画笔像素来验证。神经内核使用机器学习算法将空气签名与存储在个人中的签名匹配,提供了一种安全且高效的身份验证方法。我们所提出的系统不需要传感器或任何其他硬件,只需要摄像头。
https://arxiv.org/abs/2405.10868
This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emergencies. The controller's prediction model is a nonlinear double-track vehicle model based on an extended Fiala tyre to capture the nonlinear coupled longitudinal and lateral dynamics. The controller computes the optimal steering angle and the longitudinal forces per each of the four wheels to minimise tracking error in safe situations and maximise the vehicle-to-obstacle distance in emergencies. Thanks to the optimisation of the longitudinal tyre forces, the proposed controller can produce an extra yaw moment, increasing the vehicle's lateral agility to avoid obstacles while keeping the vehicle stable. The optimal forces are constrained in the tyre friction circle not to exceed the tyres and vehicle capabilities. In a high-fidelity simulation environment, we demonstrate the benefits of torque vectoring, showing that our proposed approach is capable of successfully avoiding obstacles and keeping the vehicle stable while driving a double-lane change manoeuvre, in comparison to baselines lacking torque vectoring or collision avoidance prioritisation.
本文提出了一种新的车辆避障方法。它涉及开发了一种非线性预测预测控制方法,该方法利用扭矩矢量来稳定和驱动车辆在临界处理情况下进行避障。所提出的算法将运动规划、路径跟踪和车辆稳定性目标相结合,优先考虑紧急情况下的避障。控制器的预测模型是基于扩展Fiala轮胎的非线性双轨迹车辆模型,用于捕捉非线性的耦合纵向和横向动力。控制器计算出每个轮子的最佳转向角和纵向力,以最小化安全情况下的跟踪误差,并最大紧急情况下的车辆与障碍物距离。由于纵向轮胎力的优化,所提出的控制器可以产生额外的偏航力,提高车辆在避障时的横向敏捷性,同时保持车辆的稳定性。最佳力矩受到轮胎摩擦圆的限制,不得超过轮胎和车辆的能力。在高速仿真环境中,我们证明了扭矩矢量的优势,表明我们提出的方法能够在成功避障和保持车辆稳定的情况下进行双道变更行驶,与缺乏扭矩矢量或避障优先处理的基线相比。
https://arxiv.org/abs/2405.10847
We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.
我们提出了一个构建基于文本的时间序列索引的最佳方法--通常,这些索引在同时关系或与目标变量(如通货膨胀)的预测性能方面最大化。为了说明我们的方法,我们通过优化基于文本的指数来跟踪VIX指数和通货膨胀预期,以示例来自《华尔街日报》的新闻文章。我们的结果表明,与现有索引相比,我们的方法具有优越的性能。
https://arxiv.org/abs/2405.10449
Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.
在序列帧中定位一个对象,已知其在序列的第一个帧中的出现,是一个具有很多阶段的复杂问题。通常,最先进的方法会关注在视觉编码或关系建模阶段引入新颖想法。然而,在我们的工作中,我们证明了从学习联合搜索和模板特征中进行边界框回归非常重要。虽然以前的方法主要依赖经过良好训练的表示搜索和模板之间交互的特征,但我们假设输入卷积边界框网络的接收域在准确确定对象位置方面发挥着重要作用。为此,我们引入了两个新颖的边界框回归网络:Inception和Deformable。实验和消融研究结果表明,我们安装在最新OdTrack上的Inception模块在三个基准测试(GOT-10k、UAV123和OTB2015)上的表现优于后者。
https://arxiv.org/abs/2405.10444
The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
由于科学研究的限制,对冰下环境的探索面临着独特的挑战。本报告调查了在冰层下部署全 actuated Remotely Operated Vehicle(ROV)进行浅水区域探索的可能性。利用海洋机器人技术的发展,ROV 提供了将人类 presence扩展到遥远水下地点的有前景的解决方案。为了实现成功的冰下探索,ROV 必须遵循有效的局部定位信号接收轨迹。本研究开发了一个多输入多输出(MIMO)非线性系统控制器,包括基于 Lipschitz 稳定性保证和自适应律来减轻未知环境干扰的功能。模糊逻辑被用于动态调整自适应速率,提高高度非线性的 ROV 动态系统的性能。此外,粒子群优化(PSO)算法自动调整控制器参数以实现最优轨迹跟踪。报告详细介绍了 ROV 动态模型、所提出的控制框架和 PSO 基于调整过程。基于模拟的实验验证了该方法的有效性,实验结果表明,与基线控制器相比,轨迹跟踪性能具有卓越的优势。这项工作为冰下探索能力的提升做出了贡献,并为未来海洋机器人学和自主水下系统研究奠定了基础。
https://arxiv.org/abs/2405.10441
Single object tracking is a vital task of many applications in critical fields. However, it is still considered one of the most challenging vision tasks. In recent years, computer vision, especially object tracking, witnessed the introduction or adoption of many novel techniques, setting new fronts for performance. In this survey, we visit some of the cutting-edge techniques in vision, such as Sequence Models, Generative Models, Self-supervised Learning, Unsupervised Learning, Reinforcement Learning, Meta-Learning, Continual Learning, and Domain Adaptation, focusing on their application in single object tracking. We propose a novel categorization of single object tracking methods based on novel techniques and trends. Also, we conduct a comparative analysis of the performance reported by the methods presented on popular tracking benchmarks. Moreover, we analyze the pros and cons of the presented approaches and present a guide for non-traditional techniques in single object tracking. Finally, we suggest potential avenues for future research in single-object tracking.
单对象跟踪是许多关键领域应用程序中的关键任务。然而,它仍然被认为是视觉领域中最具有挑战性的任务之一。在最近几年里,计算机视觉,特别是对象跟踪,见证了许多新颖技术的引入或采用,为性能设立了新的前沿。在本次调查中,我们访问了视觉领域的一些尖端技术,如序列模型、生成模型、自监督学习、无监督学习、强化学习、元学习、连续学习和领域自适应,重点关注它们在单对象跟踪中的应用。我们提出了一种基于新颖技术和趋势的新分类方法来对单对象跟踪方法进行分类。此外,我们比较了所提出的方法的性能,并分析了它们的优势和不足之处。最后,我们为单对象跟踪的非传统技术未来的研究提供了建议。
https://arxiv.org/abs/2405.10439
The Unmanned Aerial Vehicles (UAVs) market has been significantly growing and Considering the availability of drones at low-cost prices the possibility of misusing them, for illegal purposes such as drug trafficking, spying, and terrorist attacks posing high risks to national security, is rising. Therefore, detecting and tracking unauthorized drones to prevent future attacks that threaten lives, facilities, and security, become a necessity. Drone detection can be performed using different sensors, while image-based detection is one of them due to the development of artificial intelligence techniques. However, knowing unauthorized drone types is one of the challenges due to the lack of drone types datasets. For that, in this paper, we provide a dataset of various drones as well as a comparison of recognized object detection models on the proposed dataset including YOLO algorithms with their different versions, like, v3, v4, and v5 along with the Detectronv2. The experimental results of different models are provided along with a description of each method. The collected dataset can be found in this https URL
无人机(UAVs)市场已经显著增长。考虑到低成本无人机(如货运无人机、农业无人机等)的可用性,非法用途(如毒品走私、间谍活动、恐怖主义袭击等)对国家安全的威胁越来越高,因此,检测和追踪未经授权的无人机以防止未来可能对生命、设施和安全的威胁,变得势在必行。 无人机检测可以通过各种传感器进行,而图像识别检测就是其中之一,因为人工智能技术的发展。然而,由于缺乏无人机类型的数据集,知道未经授权的无人机类型是一个挑战。 为了解决这个问题,本文提供了一个包括各种无人机的数据集,以及不同版本的YOLO算法,如v3、v4和v5,以及Detectronv2,并提供了各种模型的实验结果以及方法的描述。收集的數據可以通過此处的链接找到:https://www.example.com/
https://arxiv.org/abs/2405.10398
Intraoperative ultrasound (iUS) imaging has the potential to improve surgical outcomes in brain surgery. However, its interpretation is challenging, even for expert neurosurgeons. In this work, we designed the first patient-specific framework that performs brain tumor segmentation in trackerless iUS. To disambiguate ultrasound imaging and adapt to the neurosurgeon's surgical objective, a patient-specific real-time network is trained using synthetic ultrasound data generated by simulating virtual iUS sweep acquisitions in pre-operative MR data. Extensive experiments performed in real ultrasound data demonstrate the effectiveness of the proposed approach, allowing for adapting to the surgeon's definition of surgical targets and outperforming non-patient-specific models, neurosurgeon experts, and high-end tracking systems. Our code is available at: \url{this https URL}.
内窥超声(iUS)成像在脑手术中改善了手术结果。然而,其解释对于即使是专家神经外科医生来说也是具有挑战性的。在这项工作中,我们设计了一个第一个患者特异性框架,可以在无跟踪的情况下对脑肿瘤进行分割。为了区分超声成像和适应神经外科医生的手术目标,使用在术前MR数据中模拟的虚拟iUS扫描数据训练了患者特异性实时网络。在真实超声数据中进行的广泛实验证明了所提出方法的有效性,使得可以根据外科医生的定义调整到手术目标,并超过非患者特异性模型、神经外科专家和高端跟踪系统。我们的代码可在此处下载:\url{这个链接}。
https://arxiv.org/abs/2405.09959
Quadrotors are widely employed across various domains, yet the conventional type faces limitations due to underactuation, where attitude control is closely tied to positional adjustments. In contrast, quadrotors equipped with tiltable rotors offer overactuation, empowering them to track both position and attitude trajectories. However, the nonlinear dynamics of the drone body and the sluggish response of tilting servos pose challenges for conventional cascade controllers. In this study, we propose a control methodology for tilting-rotor quadrotors based on nonlinear model predictive control (NMPC). Unlike conventional approaches, our method preserves the full dynamics without simplification and utilizes actuator commands directly as control inputs. Notably, we incorporate a first-order servo model within the NMPC framework. Through simulation, we observe that integrating the servo dynamics not only enhances control performance but also accelerates convergence. To assess the efficacy of our approach, we fabricate a tiltable-quadrotor and deploy the algorithm onboard at a frequency of 100Hz. Extensive real-world experiments demonstrate rapid, robust, and smooth pose tracking performance.
四旋翼广泛应用于各种领域,然而传统的四旋翼由于俯仰力不足,导致姿态控制与位置调整紧密耦合。相比之下,配备可倾斜旋翼的四旋翼提供了过冲量,使它们能够跟踪位置和姿态轨迹。然而,无人机身体和倾斜伺服器的非线性动力学以及缓慢的响应给传统的级联控制器带来了挑战。在这项研究中,我们提出了一种基于非线性模型预测控制(NMPC)的倾斜旋翼四旋翼控制方法。与传统方法不同,我们的方法在保留完整动态的同时避免了简化,并直接将执行器命令作为控制输入。值得注意的是,我们在NMPC框架内考虑了第一级伺服机模型。通过仿真,我们观察到,将伺服机动力纳入NMPC框架不仅提高了控制性能,而且加速了收敛。为了评估我们方法的效力,我们制造了一个可倾斜的四旋翼,并在100Hz的频率上将其部署在船上。大量现实世界的实验证明,快速、稳健和平滑的位置跟踪性能。
https://arxiv.org/abs/2405.09871
Task-oriented dialogue systems are broadly used in virtual assistants and other automated services, providing interfaces between users and machines to facilitate specific tasks. Nowadays, task-oriented dialogue systems have greatly benefited from pre-trained language models (PLMs). However, their task-solving performance is constrained by the inherent capacities of PLMs, and scaling these models is expensive and complex as the model size becomes larger. To address these challenges, we propose Soft Mixture-of-Expert Task-Oriented Dialogue system (SMETOD) which leverages an ensemble of Mixture-of-Experts (MoEs) to excel at subproblems and generate specialized outputs for task-oriented dialogues. SMETOD also scales up a task-oriented dialogue system with simplicity and flexibility while maintaining inference efficiency. We extensively evaluate our model on three benchmark functionalities: intent prediction, dialogue state tracking, and dialogue response generation. Experimental results demonstrate that SMETOD achieves state-of-the-art performance on most evaluated metrics. Moreover, comparisons against existing strong baselines show that SMETOD has a great advantage in the cost of inference and correctness in problem-solving.
面向任务的对话系统在虚拟助手和其他自动化服务中得到了广泛应用,为用户和机器之间提供了一种界面,以促进特定的任务。如今,面向任务的对话系统已经从预训练语言模型(PLMs)中极大地受益。然而,它们的任务解决能力受到PLMs固有能力的限制,而且随着模型规模的增大,扩展这些模型变得昂贵且复杂。为了应对这些挑战,我们提出了Soft Mixture-of-Expert Task-Oriented Dialogue系统(SMETOD),它利用了专家混合(MoEs)的众筹力量,在子问题解决和任务导向对话中表现出色,同时保持推理效率。此外,SMETOD还通过简单而灵活地扩展任务导向对话系统来保持推理效率。我们在三个基准功能上对模型进行了广泛评估:意图预测、对话状态跟踪和对话响应生成。实验结果表明,SMETOD在大多数评估指标上实现了最先进的性能。此外,与现有强大基线进行比较,SMETOD在推理成本和正确性方面具有巨大优势。
https://arxiv.org/abs/2405.09744
Accurate detection of vulvovaginal candidiasis is critical for women's health, yet its sparse distribution and visually ambiguous characteristics pose significant challenges for accurate identification by pathologists and neural networks alike. Our eye-tracking data reveals that areas garnering sustained attention - yet not marked by experts after deliberation - are often aligned with false positives of neural networks. Leveraging this finding, we introduce Gaze-DETR, a pioneering method that integrates gaze data to enhance neural network precision by diminishing false positives. Gaze-DETR incorporates a universal gaze-guided warm-up protocol applicable across various detection methods and a gaze-guided rectification strategy specifically designed for DETR-based models. Our comprehensive tests confirm that Gaze-DETR surpasses existing leading methods, showcasing remarkable improvements in detection accuracy and generalizability.
准确检测外阴阴道念珠菌病对女性健康至关重要,但它的稀疏分布和视觉上模糊的特点对病理学家和神经网络鉴定者来说都带来了重大挑战。我们的眼动数据表明,获得持续关注却未被专家肯定的区域通常与神经网络的假阳性结果一致。利用这一发现,我们引入了Gaze-DETR,一种开创性的方法,将眼动数据集成到神经网络中,通过降低假阳性结果来提高检测精度。Gaze-DETR采用了一个通用的眼动引导预热协议,适用于各种检测方法,并专门为基于DETR模型的检测方法设计了一个眼动引导校正策略。我们全面的测试证实,Gaze-DETR超越了现有领先方法,展示了在检测准确性和泛化方面显著的改进。
https://arxiv.org/abs/2405.09463
The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to changes in the nominal system dynamics induced by intrinsic or environmental factors. To overcome these limitations, this study presents an adaptive Koopman architecture capable of responding to the changes in system dynamics online. The proposed framework initially employs an autoencoder-based neural network that utilizes input-output information from the nominal system to learn the corresponding Koopman embedding offline. Subsequently, we augment this nominal Koopman architecture with a feed-forward neural network that learns to modify the nominal dynamics in response to any deviation between the predicted and observed lifted states, leading to improved generalization and robustness to a wide range of uncertainties and disturbances compared to contemporary methods. Extensive tracking control simulations, which are undertaken by integrating the proposed scheme within a Model Predictive Control framework, are used to highlight its robustness against measurement noise, disturbances, and parametric variations in system dynamics.
线性嵌入的发现是 nonlinear系统线性控制技术合成的关键。近年来,虽然Koopman操作子理论通过数据驱动方法学习这些线性嵌入取得了突出地位,但这些算法在泛化能力上常常存在局限性,不仅限于训练数据所捕获的分布,而且对由内生或环境因素引起的拟合系统动态变化不具有鲁棒性。为了克服这些限制,本研究提出了一个自适应Koopman架构,能够在线系统动态变化发生时响应变化。所提出的框架首先采用了一个基于自动编码器的神经网络,利用名义系统的输入-输出信息来学习相应的Koopman嵌入。随后,我们通过一个前馈神经网络来增强这种名义Koopman架构,使其能够根据预测和观察到的抬升状态对名义动态进行修改,从而提高泛化能力和对当代方法的鲁棒性。为了检验这种方法对测量噪声、干扰和系统动态参数变化等的鲁棒性,我们在Model预测控制框架中进行了广泛的跟踪控制仿真。
https://arxiv.org/abs/2405.09101
The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features and the feasibility based on temporal features of realizing effective detection. According to this analysis, we use a multi-frame as a detection unit and propose a detection method based on temporal energy selective scaling (TESS). Specifically, we investigated the composition of intensity temporal profiles (ITPs) formed by pixels on a multi-frame detection unit. For the target-present pixel, the target passing through the pixel will bring a weak transient disturbance on the ITP and introduce a change in the statistical properties of ITP. We use a well-designed function to amplify the transient disturbance, suppress the background and noise components, and output the trajectory of the target on the multi-frame detection unit. Subsequently, to solve the contradiction between the detection rate and the false alarm rate brought by the traditional threshold segmentation, we associate the temporal and spatial features of the output trajectory and propose a trajectory extraction method based on the 3D Hough transform. Finally, we model the trajectory of the target and propose a trajectory-based multi-target tracking method. Compared with the various state-of-the-art detection and tracking methods, experiments in multiple scenarios prove the superiority of our proposed methods.
被动光学遥感(PORS)中检测和跟踪小目标具有广泛的应用价值。然而,之前提出的大多数方法很少利用目标运动产生的丰富时变特征,导致低信号-噪声比(SCR)目标检测和跟踪性能较差。在本文中,我们分析基于空间特征和基于时变特征实现有效检测的难度,并根据分析结果提出了一种基于时变能量选择性缩放(TESS)的检测方法。具体来说,我们研究了多帧中像素产生的强度时变轮廓(ITP)的组成。对于目标存在的像素,穿过像素的目标会对ITP产生弱暂态干扰,并改变ITP的统计特性。我们使用一个精心设计的函数来放大暂态干扰,抑制背景和噪声分量,并输出目标在多帧检测单元上的轨迹。为了解决传统阈值分割带来的检测率和误报警率之间的矛盾,我们将输出轨迹的时域和空间特征相关联,并提出了基于3D Hough变换的轨迹提取方法。最后,我们建模了目标轨迹,并提出了基于轨迹的多目标跟踪方法。与各种最先进的检测和跟踪方法相比,多个场景下的实验证明了我们提出方法的优越性。
https://arxiv.org/abs/2405.09054
We investigate the problem of pixelwise correspondence for deformable objects, namely cloth and rope, by comparing both classical and learning-based methods. We choose cloth and rope because they are traditionally some of the most difficult deformable objects to analytically model with their large configuration space, and they are meaningful in the context of robotic tasks like cloth folding, rope knot-tying, T-shirt folding, curtain closing, etc. The correspondence problem is heavily motivated in robotics, with wide-ranging applications including semantic grasping, object tracking, and manipulation policies built on top of correspondences. We present an exhaustive survey of existing classical methods for doing correspondence via feature-matching, including SIFT, SURF, and ORB, and two recently published learning-based methods including TimeCycle and Dense Object Nets. We make three main contributions: (1) a framework for simulating and rendering synthetic images of deformable objects, with qualitative results demonstrating transfer between our simulated and real domains (2) a new learning-based correspondence method extending Dense Object Nets, and (3) a standardized comparison across state-of-the-art correspondence methods. Our proposed method provides a flexible, general formulation for learning temporally and spatially continuous correspondences for nonrigid (and rigid) objects. We report root mean squared error statistics for all methods and find that Dense Object Nets outperforms baseline classical methods for correspondence, and our proposed extension of Dense Object Nets performs similarly.
我们通过比较基于经典方法和基于学习的方法来研究可变形物体的像素级对应问题,包括布料和绳索。我们选择布料和绳索是因为它们是传统上最难以用大配置空间进行分析建模的变形物体之一,而且在机器人任务如布料折叠、绳索结结、T恤折叠、窗帘关闭等背景下具有重要意义。对应问题在机器人领域受到广泛关注,包括通过特征匹配的语义抓取、物体跟踪和操作策略等。我们全面调查了通过特征匹配实现对应的传统经典方法,包括SIFT、SURF和ORB,以及两篇最近发表的学习方法TimeCycle和Dense Object Nets。我们做出了三个主要贡献:(1)通过模拟和渲染变形物体的合成图像,展示了模拟和真实领域之间的转移;(2)扩展了Dense Object Nets的新学习方法;(3)对最先进的对应方法进行了标准化比较。我们提出的方法为学习非刚性(和刚性)对象的时域和空间连续对应提供了一个灵活、通用的公式。我们报告了所有方法的所有者的根均方误差统计,并发现,Dense Object Nets基线经典方法在对应方面优越,而我们的对Dense Object Nets的扩展也具有相似的性能。
https://arxiv.org/abs/2405.08996
Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning. Tracking-by-attention, however, entangles detection and tracking queries in one embedding for both the detection and tracking task, which is sub-optimal. Other approaches resemble the tracking-by-detection paradigm, detecting objects using decoupled track and detection queries followed by a subsequent association. These methods, however, do not leverage synergies between the detection and association task. Combining the strengths of both paradigms, we introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention, leveraging appearance and geometric features. Furthermore, we integrate this association module into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined for the detection and association task alternately, effectively harnessing the task dependencies. We evaluate our method on the nuScenes dataset and demonstrate the advantage of our approach compared to the two previous paradigms. Code is available at this https URL.
许多基于查询的3D多对象跟踪(MOT)方法采用了关注点的跟踪范式,利用跟踪查询进行身份一致的检测,利用对象查询进行身份无关的跟踪生成。然而,关注点的跟踪范式将检测和跟踪查询在同一个嵌入中纠缠在一起,对于检测和跟踪任务来说不是最优解。其他方法类似于跟踪-by-detection范式,使用解耦的跟踪和检测查询然后进行后续的相关联来检测物体。然而,这些方法并未利用检测和关联任务之间的协同作用。通过结合这两种范式的优势,我们引入了ADA-Track,一种从多视角摄像机视角的3D MOT的新型端到端框架。我们基于边缘增强交叉注意力的可学习数据关联模块,利用外观和几何特征。此外,我们将该关联模块集成到基于DETR的3D检测器的解码层中,实现同时检测和查询到图像的交叉注意。通过堆叠这些解码层,查询在检测和关联任务上进行 alternating refine,有效利用了任务依赖关系。我们在nuScenes数据集上评估我们的方法,并证明了与前两种范式相比,我们的方法具有优势。代码可在此处下载:https://www.xxx.com/。
https://arxiv.org/abs/2405.08909
Non-prehensile manipulation enables fast interactions with objects by circumventing the need to grasp and ungrasp as well as handling objects that cannot be grasped through force closure. Current approaches to non-prehensile manipulation focus on static contacts, avoiding the underactuation that comes with sliding. However, the ability to control sliding contact, essentially removing the no-slip constraint, opens up new possibilities in dynamic manipulation. In this paper, we explore a challenging dynamic non-prehensile manipulation task that requires the consideration of the full spectrum of hybrid contact modes. We leverage recent methods in contact-implicit MPC to handle the multi-modal planning aspect of the task. We demonstrate, with careful consideration of integration between the simple model used for MPC and the low-level tracking controller, how contact-implicit MPC can be adapted to dynamic tasks. Surprisingly, despite the known inaccuracies of frictional rigid contact models, our method is able to react to these inaccuracies while still quickly performing the task. Moreover, we do not use common aids such as reference trajectories or motion primitives, highlighting the generality of our approach. To the best of our knowledge, this is the first application of contact-implicit MPC to a dynamic manipulation task in three dimensions.
非抓取操作使通过绕开抓取和解抓取的需要,以及处理无法通过力闭合来抓取的对象,实现了与物体的高速互动。目前,非抓取操作方法主要关注静态接触,避免与滑动相关的松动。然而,控制滑动接触的能力,本质上消除松动约束,为动态操作带来了新的可能性。在本文中,我们探讨了一个具有挑战性的动态非抓取操作任务,需要考虑混合接触模式的完整范围。我们利用最近在接触隐式MPC中的方法来处理任务的 Multi-modal 规划方面。我们证明了,在仔细考虑简单模型用于MPC和低级跟踪控制器之间的集成的情况下,接触隐式MPC可以适应动态任务。令人惊讶的是,尽管已知摩擦刚性接触模型的不准确度,但我们的方法仍然能够应对这些不准确度,同时仍然快速地完成任务。此外,我们没有使用常见的辅助工具,如参考轨迹或运动原型,这突出了我们方法的普遍性。据我们所知,这是在三维空间中第一次将接触隐式MPC应用于动态操作任务。
https://arxiv.org/abs/2405.08731
Tissue tracking in echocardiography is challenging due to the complex cardiac motion and the inherent nature of ultrasound acquisitions. Although optical flow methods are considered state-of-the-art (SOTA), they struggle with long-range tracking, noise occlusions, and drift throughout the cardiac cycle. Recently, novel learning-based point tracking techniques have been introduced to tackle some of these issues. In this paper, we build upon these techniques and introduce EchoTracker, a two-fold coarse-to-fine model that facilitates the tracking of queried points on a tissue surface across ultrasound image sequences. The architecture contains a preliminary coarse initialization of the trajectories, followed by reinforcement iterations based on fine-grained appearance changes. It is efficient, light, and can run on mid-range GPUs. Experiments demonstrate that the model outperforms SOTA methods, with an average position accuracy of 67% and a median trajectory error of 2.86 pixels. Furthermore, we show a relative improvement of 25% when using our model to calculate the global longitudinal strain (GLS) in a clinical test-retest dataset compared to other methods. This implies that learning-based point tracking can potentially improve performance and yield a higher diagnostic and prognostic value for clinical measurements than current techniques. Our source code is available at: this https URL.
在超声心动图中,组织追踪是一个具有挑战性的任务,因为心脏运动复杂,超声采集的固有本质导致了跟踪过程中的噪声阻塞和漂移。尽管光学流方法被认为是最先进的(SOTA),但它们在长距离跟踪、噪声遮挡和整个心动周期内的漂移方面表现不佳。最近,基于学习的新颖跟踪技术已经引入,以解决这些问题。在本文中,我们沿着这些技术进行了探讨,并引入了EchoTracker,一种双精度模型,用于在超声图像序列上追踪被查询的点在组织表面的轨迹。该架构包含轨迹的初步粗初始化,然后基于细粒度外观变化进行强化迭代。它既高效又轻便,可以在中高档GPU上运行。实验证明,与SOTA方法相比,该模型具有优异的性能,平均位置精度为67%,轨迹误差为2.86像素。此外,我们还展示了使用我们的模型在临床测试再测数据集中计算全局纵向应变(GLS)时的相对改善,相比其他方法,GLS的计算性能提高了25%。这表明,基于学习的点追踪技术有可能提高性能,为临床测量提供更高的诊断和预后价值,目前的技术无法满足临床需求。我们的源代码可在此处下载:https:// this URL.
https://arxiv.org/abs/2405.08587
Safe maneuvering capability is critical for mobile robots in complex environments. However, robotic system dynamics are often time-varying, uncertain, or even unknown during the motion planning and control process. Therefore, many existing model-based reinforcement learning (RL) methods could not achieve satisfactory reliability in guaranteeing safety. To address this challenge, we propose a two-level Vector Field-guided Learning Predictive Control (VF-LPC) approach that guarantees safe maneuverability. The first level, the guiding level, generates safe desired trajectories using the designed kinodynamic guiding vector field, enabling safe motion in obstacle-dense environments. The second level, the Integrated Motion Planning and Control (IMPC) level, first uses the deep Koopman operator to learn a nominal dynamics model offline and then updates the model uncertainties online using sparse Gaussian processes (GPs). The learned dynamics and game-based safe barrier function are then incorporated into the learning predictive control framework to generate near-optimal control sequences. We conducted tests to compare the performance of VF-LPC with existing advanced planning methods in an obstacle-dense environment. The simulation results show that it can generate feasible trajectories quickly. Then, VF-LPC is evaluated against motion planning methods that employ model predictive control (MPC) and RL in high-fidelity CarSim software. The results show that VF-LPC outperforms them under metrics of completion time, route length, and average solution time. We also carried out path-tracking control tests on a racing road to validate the model uncertainties learning capability. Finally, we conducted real-world experiments on a Hongqi E-HS3 vehicle, further validating the VF-LPC approach's effectiveness.
保证移动机器人在复杂环境中的安全机动能力至关重要。然而,机器人系统动力学通常在运动规划和控制过程中是时间变化、不确定或甚至是未知的。因此,许多基于模型的强化学习(RL)方法无法在保证安全方面达到令人满意的可靠性。为解决这个问题,我们提出了一个两级Vector Field-guided Learning Predictive Control(VF-LPC)方法,以确保安全机动。 第一级,指导层,使用设计的水动力引导向量场生成安全的愿望轨迹,使机器人在密集障碍物的环境中安全运动。第二级,集成运动规划与控制(IMPC)层,首先使用深度Koopman操作学习一个定理动态模型,然后在线使用稀疏高斯过程(GPs)更新模型不确定性。然后将学到的动态和基于游戏的safe barrier函数纳入学习预测控制框架,生成最优控制序列。我们对VF-LPC与现有高级规划方法在密集障碍物的环境中的性能进行了测试。 仿真结果表明,VF-LPC可以快速生成可行轨迹。然后,将VF-LPC与采用模型预测控制(MPC)和RL的高保真度CarSim软件的运动规划方法进行比较。结果表明,在完成时间、路径长度和平均解决方案时间等指标上,VF-LPC优越。我们还对赛车道路进行了路径跟踪控制测试,以验证模型不确定性学习能力的有效性。 最后,我们在一辆 Hongqi E-HS3 车上进行了实际实验,进一步验证了VF-LPC方法的有效性。
https://arxiv.org/abs/2405.08283
The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.
真实的三维手虚拟形象,包括手形状和纹理等可识别的信息,对于AR/VR中的沉浸体验是必要的。在本文中,我们提出了一个通用的手模型(UHM),它具有以下两个特点:1)可以普遍代表任意身份(ID)的高保真3D手网格;2)可以通过短电话扫描,将真实手虚拟形象个性化定制。为了实现有效的通用手建模,我们在跟踪和建模的同时进行,而之前的手模型则是在建模阶段分别进行。传统的单独管道从跟踪阶段积累的错误,在建模阶段无法恢复。另一方面,我们的模型在整体流程中没有积累的错误,而且更加简洁。此外,我们还引入了一种新的图像匹配损失函数,以解决跟踪和建模过程中的皮肤滑动问题,而现有的工作并没有太多关注这个问题。最后,使用从UHM中学习到的先验知识,我们有效地将UHM个性定制化每个人的短电话扫描的真实手虚拟形象。
https://arxiv.org/abs/2405.07933
Many aging individuals encounter challenges in effectively tracking their dietary intake, exacerbating their susceptibility to nutrition-related health complications. Self-reporting methods are often inaccurate and suffer from substantial bias; however, leveraging intelligent prediction methods can automate and enhance precision in this process. Recent work has explored using computer vision prediction systems to predict nutritional information from food images. Still, these methods are often tailored to specific situations, require other inputs in addition to a food image, or do not provide comprehensive nutritional information. This paper aims to enhance the efficacy of dietary intake estimation by leveraging various neural network architectures to directly predict a meal's nutritional content from its image. Through comprehensive experimentation and evaluation, we present NutritionVerse-Direct, a model utilizing a vision transformer base architecture with three fully connected layers that lead to five regression heads predicting calories (kcal), mass (g), protein (g), fat (g), and carbohydrates (g) present in a meal. NutritionVerse-Direct yields a combined mean average error score on the NutritionVerse-Real dataset of 412.6, an improvement of 25.5% over the Inception-ResNet model, demonstrating its potential for improving dietary intake estimation accuracy.
许多老年人面临着在有效跟踪他们的饮食摄入方面遇到的挑战,从而加剧了他们易受与营养相关的健康问题的易感性。自我报告的方法通常是不准确的,并存在很大的偏差;然而,利用智能预测方法可以自动化和提高这个过程的精度。最近的工作已经探索了使用计算机视觉预测系统预测食品图像中的营养信息。然而,这些方法通常都是针对特定情况设计的,需要其他输入,或者没有提供全面的营养信息。本文旨在通过利用各种神经网络架构增强饮食摄入估计的有效性,从而预测从食品图像直接预测餐的营养成分。通过全面的实验和评估,我们提出了NutritionVerse-Direct模型,该模型使用视觉Transformer基础架构和三个全连接层,从而预测餐中的卡路里、质量(g)、蛋白质(g)、脂肪(g)和碳水化合物(g)。NutritionVerse-Direct在NutritionVerse-Real数据集上的综合平均误差分数为412.6,比Inception-ResNet模型提高了25.5%,证明了其提高饮食摄入估计准确性的潜力。
https://arxiv.org/abs/2405.07814