Robot multimodal locomotion encompasses the ability to transition between walking and flying, representing a significant challenge in robotics. This work presents an approach that enables automatic smooth transitions between legged and aerial locomotion. Leveraging the concept of Adversarial Motion Priors, our method allows the robot to imitate motion datasets and accomplish the desired task without the need for complex reward functions. The robot learns walking patterns from human-like gaits and aerial locomotion patterns from motions obtained using trajectory optimization. Through this process, the robot adapts the locomotion scheme based on environmental feedback using reinforcement learning, with the spontaneous emergence of mode-switching behavior. The results highlight the potential for achieving multimodal locomotion in aerial humanoid robotics through automatic control of walking and flying modes, paving the way for applications in diverse domains such as search and rescue, surveillance, and exploration missions. This research contributes to advancing the capabilities of aerial humanoid robots in terms of versatile locomotion in various environments.
机器人的多模式行走涵盖了步行和飞行之间的平滑过渡,代表了机器人领域的一个重大挑战。这项工作提出了一种方法,可以使机器人实现自动平滑过渡,即从步行到飞行的转型。利用对抗运动先验的概念,我们的算法使机器人能够模仿运动数据集,并完成所需的任务,而不需要复杂的奖励函数。机器人从人类步态学习步行模式,从通过路径优化获得的飞行模式中学习空中行走模式。通过这个过程,机器人使用强化学习环境反馈来适应步行和飞行模式,并出现了模式切换行为。结果突出了通过自动控制步行和飞行模式实现多模式行走的潜力,为各种应用领域(如搜索和救援、监视和探索)提供了应用前景。这项工作为空中型人类机器人在各种环境中的多功能行走提供了扩展能力。
https://arxiv.org/abs/2309.12784
Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, mapping, motion estimation, visual surveillance, and simultaneous localization and mapping. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera system for underwater applications is a challenging problem due to the technical complexity in several aspects including sensor resolution, wide field of view, power supply, optical design, system calibration, and overheating management. This paper presents a novel and comprehensive system that addresses the complexities associated with the design, construction, and implementation of a fully functional 360$^{\circ}$ real-time streaming camera system specifically tailored for underwater environments. Our proposed system, UWA360CAM, can stream video in real time, operate in 24/7, and capture 360$^{\circ}$ underwater panorama images. Notably, our work is the pioneering effort in providing a detailed and replicable account of this system. The experiments provide a comprehensive analysis of our proposed system.
Omnidirectional camera是一种高效、信息丰富的传感器,对于许多海洋应用和海洋科学 community 非常合适,涵盖了多个领域,如增强现实、地图、运动估计、视觉监控和同时位置和地图定位。然而,设计并建造这样高质量的水下实时流媒体相机系统是一个挑战性的问题,因为多个方面存在技术复杂性,包括传感器分辨率、广视角、电源、光学设计、系统校准和过载管理。本文提出了一种创新和全面的系统,旨在解决设计和建造一个全功能360度实时流媒体相机系统,专门定制为水下环境的复杂问题。我们提出的系统称为 UWA360CAM,可以实时流媒体视频、24小时运行并保持系统校准,并捕捉360度水下全景图像。值得注意的是,我们的工作是提供详细和可重复性描述的先驱工作。实验提供了我们提出的系统的全面分析。
https://arxiv.org/abs/2309.12668
In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems.
在监控中,准确识别车牌常常由于它们的质量较低和尺寸较小而受到限制,从而影响识别精度。尽管基于人工智能的图像超分辨率技术取得了进展,但像卷积神经网络(CNNs)和生成对抗网络(GANs)等方法在增强车牌图像方面仍无法满足要求。本研究利用最先进的扩散模型,该模型在图像恢复方面一直比其他深度学习技术表现更好。通过使用沙特车牌的 curated 数据集,以低和高分辨率两种形式训练该模型,我们发现了扩散模型的优越性。方法在峰值信号-噪声比(PSNR)方面实现了12.55\%和37.32%的 improvement,分别比 SwinIR 和ESRGAN 提高了37.32%和12.55%。此外,我们的方法和这些技术在结构相似性指数(SSIM)方面超过了它们,分别提高了4.89%和17.66%。此外,92%的人类评估者认为我们的图像比来自其他算法的图像更喜欢。本研究提出了车牌超分辨率的开创性解决方案,对于监控系统具有实际潜力。
https://arxiv.org/abs/2309.12506
3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints, potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey, which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them.
3D面部重建算法从图像和视频中应用于许多领域,从Plastic surgery到娱乐行业,由于其有利的特点。然而,在考虑法医应用时,3D面部重建必须遵守严格的要求,仍使其在将证据带到诉讼中的可能性作用不明确。广泛的调查仍缺失对其在法医应用中的限制、潜力和极限的深入研究。当前的研究目标是本文的目标,它开始澄清法医应用和生物识别之间的关系,并以面部识别为焦点。因此,它提供了从监控视频和囚犯照片中提取的3D面部重建算法的成就分析,并讨论了将3D面部重建从法医应用中积极参与所需的当前障碍。最后,它研究了其基础数据集的优势和限制,并提出了可以替代或补充它们的其他选择。
https://arxiv.org/abs/2309.11357
Gait recognition (GR) is a growing biometric modality used for person identification from a distance through visual cameras. GR provides a secure and reliable alternative to fingerprint and face recognition, as it is harder to distinguish between false and authentic signals. Furthermore, its resistance to spoofing makes GR suitable for all types of environments. With the rise of deep learning, steadily improving strides have been made in GR technology with promising results in various contexts. As video surveillance becomes more prevalent, new obstacles arise, such as ensuring uniform performance evaluation across different protocols, reliable recognition despite shifting lighting conditions, fluctuations in gait patterns, and protecting privacy.This survey aims to give an overview of GR and analyze the environmental elements and complications that could affect it in comparison to other biometric recognition systems. The primary goal is to examine the existing deep learning (DL) techniques employed for human GR that may generate new research opportunities.
步识别(GR)是一种正在增长的生物特征识别方式,通过视觉摄像机用于远距离人员身份识别。GR提供了指纹和面部识别的可靠和安全替代品,因为更难区分虚假和真实信号。此外,它的抗伪造能力使GR适用于各种环境。随着深度学习的兴起,GR技术稳步前进,在各种情况下取得了令人瞩目的成果。随着视频监控越来越普遍,出现了新的问题,例如确保不同协议下一致的性能评估、即使在不同照明条件下也能可靠识别、步态模式的不规则变化以及保护隐私。本调查旨在提供一个概述GR的情况,并分析与环境元素和复杂性相比可能对其产生影响的其他生物特征识别系统。其主要目标是审查现有的人类步识别(GR)技术,可能为新的研究机会提供支持。
https://arxiv.org/abs/2309.10144
Face recognition systems have become increasingly vulnerable to security threats in recent years, prompting the use of Face Anti-spoofing (FAS) to protect against various types of attacks, such as phone unlocking, face payment, and self-service security inspection. While FAS has demonstrated its effectiveness in traditional settings, securing it in long-distance surveillance scenarios presents a significant challenge. These scenarios often feature low-quality face images, necessitating the modeling of data uncertainty to improve stability under extreme conditions. To address this issue, this work proposes Distributional Estimation (DisE), a method that converts traditional FAS point estimation to distributional estimation by modeling data uncertainty during training, including feature (mean) and uncertainty (variance). By adjusting the learning strength of clean and noisy samples for stability and accuracy, the learned uncertainty enhances DisE's performance. The method is evaluated on SuHiFiMask [1], a large-scale and challenging FAS dataset in surveillance scenarios. Results demonstrate that DisE achieves comparable performance on both ACER and AUC metrics.
人脸识别系统在近年来变得越来越容易受到安全威胁,因此需要使用人脸识别反伪造(FAS)来保护 against 各种攻击,例如手机解锁、人脸支付和自助安全检验。虽然FAS在传统环境中已经证明了其有效性,但在远程监控场景中确保安全则是一项重大挑战。这些场景往往包含低质量人脸图像,因此需要在训练期间建模数据不确定性,以改善在极端条件下的稳定性。为了解决这一问题,本研究提出了分布估计(DisE),一种方法,通过在训练期间建模数据不确定性,将传统的FAS点估计转换为分布估计。该方法包括特征(均值)和不确定性(方差)。通过调整干净和噪声样本的学习强度,以维持稳定性和准确性, learned不确定性增强了DisE的性能。该方法在SuHiFiMask[1]上的评估结果表明,DisE在ACER和AUC metrics上实现了相似的性能。
https://arxiv.org/abs/2309.09485
Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios.
检测枪支并在图像或视频中准确地定位携带枪支的人对于安全、监控和内容定制至关重要。然而,在复杂的环境下,由于枪支的混乱和多种形状,这项任务面临着巨大的挑战。为了解决这一问题,我们提出了一种创新的方法,利用人类-枪支交互信息,这些信息为定位枪支携带者提供了宝贵的线索。我们的方法包括一个注意力机制,通过关注相关区域,有效地将人类和枪支从背景中区分开来。此外,我们引入了一种基于注意力的局部保留约束,以在学习输入图像中的重要特征的同时,保留前景信息。通过将这些组件结合起来,我们在新提出的数据集上取得了卓越的结果。为了处理不同大小输入,我们使用注意力掩码将一对人类-枪支实例作为通道通过深度网络进行特征计算,并使用自适应平均池化层。我们广泛评估了我们的方法在人类-物体交互检测方面的现有方法,并取得了显著的结果(AP=77.8\%),与基准方法(AP=63.1\%)相比。这表明利用注意力机制和基于注意力的局部保留约束进行准确人类-枪支交互检测的有效性。我们的发现为推进安全和监控领域,使能够在各种情况下更有效地定位和识别枪支。
https://arxiv.org/abs/2309.09236
To enable the computation of effective randomized patrol routes for single- or multi-robot teams, we present RoSSO, a Python package designed for solving Markov chain optimization problems. We exploit machine-learning techniques such as reverse-mode automatic differentiation and constraint parametrization to achieve superior efficiency compared to general-purpose nonlinear programming solvers. Additionally, we supplement a game-theoretic stochastic surveillance formulation in the literature with a novel greedy algorithm and multi-robot extension. We close with numerical results for a police district in downtown San Francisco that demonstrate RoSSO's capabilities on our new formulations and the prior work.
为了实现对单或多机器人团队的有效随机巡逻路线的计算,我们提出了 RoSSO,一个针对解决马尔可夫链优化问题的 Python 包。我们利用逆模式自动 differentiation 和约束参数化等机器学习技术,相对于通用非线性编程求解器实现了更高的效率。此外,我们还文献中提供了一种新的贪婪算法和多机器人扩展,以补充游戏理论的随机监督 formulation。最后,我们展示了对旧金山市中心一个警察局的数学结果,以展示 RoSSO 在新 formulation 和先前工作中的能力。
https://arxiv.org/abs/2309.08742
We address the problem of efficient and unobstructed surveillance or communication in complex environments. On one hand, one wishes to use a minimal number of sensors to cover the environment. On the other hand, it is often important to consider solutions that are robust against sensor failure or adversarial attacks. This paper addresses these challenges of designing minimal sensor sets that achieve multi-coverage constraints -- every point in the environment is covered by a prescribed number of sensors. We propose a greedy algorithm to achieve the objective. Further, we explore deep learning techniques to accelerate the evaluation of the objective function formulated in the greedy algorithm. The training of the neural network reveals that the geometric properties of the data significantly impact the network's performance, particularly at the end stage. By taking into account these properties, we discuss the differences in using greedy and $\epsilon$-greedy algorithms to generate data and their impact on the robustness of the network.
我们解决了在复杂环境中高效且无阻碍的监视或通信问题。一方面,我们希望使用最少的传感器覆盖环境。另一方面,考虑传感器失效或对抗攻击等安全问题非常重要。本文探讨了设计实现多 coverage 限制的最小传感器集的挑战,这些传感器集需要覆盖环境中的所有点。我们提出了一种贪心算法来实现目标。此外,我们探索了深度学习技术,以加速贪心算法中目标函数的评估。神经网络的训练表明,数据的空间性质对网络性能产生了重大影响,特别是在最后阶段。考虑到这些性质,我们讨论了使用贪心算法和 $\epsilon$ 贪心算法生成数据的差异,以及这些算法对网络鲁棒性的影响了。
https://arxiv.org/abs/2309.08545
Real-time transportation surveillance is an essential part of the intelligent transportation system (ITS). However, images captured under low-light conditions often suffer the poor visibility with types of degradation, such as noise interference and vague edge features, etc. With the development of imaging devices, the quality of the visual surveillance data is continually increasing, like 2K and 4K, which has more strict requirements on the efficiency of image processing. To satisfy the requirements on both enhancement quality and computational speed, this paper proposes a double domain guided real-time low-light image enhancement network (DDNet) for ultra-high-definition (UHD) transportation surveillance. Specifically, we design an encoder-decoder structure as the main architecture of the learning network. In particular, the enhancement processing is divided into two subtasks (i.e., color enhancement and gradient enhancement) via the proposed coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM), which are embedded in the encoder-decoder structure. It enables the network to enhance the color and edge features simultaneously. Through the decomposition and reconstruction on both color and gradient domains, our DDNet can restore the detailed feature information concealed by the darkness with better visual quality and efficiency. The evaluation experiments on standard and transportation-related datasets demonstrate that our DDNet provides superior enhancement quality and efficiency compared with the state-of-the-art methods. Besides, the object detection and scene segmentation experiments indicate the practical benefits for higher-level image analysis under low-light environments in ITS.
实时交通监控是智能交通系统(ITS)不可或缺的一部分。然而,在低光条件下拍摄的照片常常出现质量下降的情况,例如噪声干扰和模糊的边缘特征等。随着影像设备的不断发展,图像质量也在不断提高,例如2K和4K,对图像处理效率的要求更加严格。为了满足增强质量和计算速度的要求,本文提出了一种双重 domains 引导的实时低光图像增强网络(DDNet),用于 Ultra-High-Definition (UHD)交通监控。具体来说,我们设计了编码-解码结构作为学习网络的主要架构。特别是,增强处理通过 proposed 粗增强模块(CEM)和LoG-based 梯度增强模块(GEM)被分解为两个子任务(即颜色增强和梯度增强),这些模块嵌入在编码-解码结构中。这使网络可以同时增强颜色和边缘特征。通过在颜色和梯度两个域上进行分解和重建,我们的 DDNet 可以恢复被黑暗掩盖的详细特征信息,具有更好的视觉质量和效率。标准和交通相关数据集的评估实验表明,我们的 DDNet 与现有方法相比提供了更好的增强质量和效率。此外,物体检测和场景分割实验表明,在 ITS 中低光环境下的高级图像分析实践中具有实际 benefits。
https://arxiv.org/abs/2309.08382
Unmanned Aerial Vehicles (UAVs) have gained significant prominence in recent years for areas including surveillance, search, rescue, and package delivery. One key aspect in UAV operations shared across all these tasks is the autonomous path planning, which enables UAV to navigate through complex, unknown, and dynamic environments while avoiding obstacles without human control. Despite countless efforts having been devoted to this subject, new challenges are constantly arisen due to the persistent trade-off between performance and cost. And new studies are more urgently needed to develop autonomous system for UAVs with parsimonious sensor setup, which is a major need for wider adoptions. To this end, we propose an end-to-end autonomous framework to enable UAVs with only one single 2D-LiDAR sensor to operate in unknown dynamic environments. More specifically, we break our approach into three stages: a pre-processing Map Constructor; an offline Mission Planner; and an online reinforcement learning (RL)-based Dynamic Obstacle Handler. Experiments show that our approach provides robust and reliable dynamic path planning and obstacle avoidance with only 1/10 of the cost in sensor configuration. The code will be made public upon acceptance.
无人机(UAVs)在过去几年中在许多领域都获得了显著的关注和重视,包括监视、搜索、救援和包裹配送。在所有这些任务中,无人机操作的一个关键方面是自主路径规划,这使无人机能够在复杂的未知动态环境中导航,同时避免人类无法控制的障碍。尽管已经做出了无数努力,但由于性能与成本之间的长期权衡,仍然不断出现新挑战。此外,开发基于紧凑传感器设置的无人机自主系统是一个极其重要的需求,因此我们需要开展新的研究。为此,我们提出了一个端到端的自主框架,以便让只有单个2D激光雷达传感器的无人机在未知动态环境中运行。更具体地说,我们将其分为三个阶段:预处理地图构建、离线任务规划以及在线基于强化学习的动态障碍物处理。实验表明,我们的 approach 可以提供可靠和稳定的动态路径规划和障碍物避免,而传感器配置成本仅为其成本的1/10。代码将在接受后公开发布。
https://arxiv.org/abs/2309.08095
Robot vision often involves a large computational load due to large images to process in a short amount of time. Existing solutions often involve reducing image quality which can negatively impact processing. Another approach is to generate regions of interest with expensive vision algorithms. In this paper, we evaluate how audio can be used to generate regions of interest in optical images. To achieve this, we propose a unique attention mechanism to localize speech sources and evaluate its impact on a face detection algorithm. Our results show that the attention mechanism reduces the computational load. The proposed pipeline is flexible and can be easily adapted for human-robot interactions, robot surveillance, video-conferences or smart glasses.
机器人视觉往往由于处理大量图像需要在短时间内进行,而需要大量的计算资源。现有解决方案往往涉及到降低图像质量,这可能会对处理产生负面影响。另一种方法是使用昂贵的视觉算法生成感兴趣的区域。在本文中,我们评估了如何使用音频来生成光学图像感兴趣的区域。为了实现这一点,我们提出了一种独特的注意力机制,以定位语音来源,并评估它对人脸识别算法的影响。我们的结果表明,注意力机制可以降低计算资源。我们提出的管道是灵活的,可以轻松适应人类-机器人交互、机器人监控、视频通话或智能眼镜等场景。
https://arxiv.org/abs/2309.08005
Multi-Agent Reinforcement Learning (MARL) has achieved significant success in large-scale AI systems and big-data applications such as smart grids, surveillance, etc. Existing advancements in MARL algorithms focus on improving the rewards obtained by introducing various mechanisms for inter-agent cooperation. However, these optimizations are usually compute- and memory-intensive, thus leading to suboptimal speed performance in end-to-end training time. In this work, we analyze the speed performance (i.e., latency-bounded throughput) as the key metric in MARL implementations. Specifically, we first introduce a taxonomy of MARL algorithms from an acceleration perspective categorized by (1) training scheme and (2) communication method. Using our taxonomy, we identify three state-of-the-art MARL algorithms - Multi-Agent Deep Deterministic Policy Gradient (MADDPG), Target-oriented Multi-agent Communication and Cooperation (ToM2C), and Networked Multi-Agent RL (NeurComm) - as target benchmark algorithms, and provide a systematic analysis of their performance bottlenecks on a homogeneous multi-core CPU platform. We justify the need for MARL latency-bounded throughput to be a key performance metric in future literature while also addressing opportunities for parallelization and acceleration.
在大型人工智能系统和大数据应用(如智能电网、监控等)中,多Agent Reinforcement Learning (MARL)已经取得了显著的成功。当前在MARL算法方面的发展主要集中在改进引入各种机制来促进多agent合作所获得的奖励。但这些优化通常计算和内存密集型,因此在全过程中的性能表现最优化通常只是在训练时间的某个局部阶段出现。在本文中,我们将分析速度表现(即延迟限制吞吐量)作为MARL实现的关键指标。具体来说,我们首先介绍了MARL算法的分类表,按照(1)训练方案和(2)通信方式进行分类。通过我们的分类表,我们识别了三种先进的MARL算法——多Agent Deep Deterministic Policy Gradient(MADDPG)、目标导向的多Agent通信和合作(ToM2C)和网络状的多Agent RL(NeurComm)——作为目标基准算法,并在一个相同的多核心CPU平台上进行了系统级的性能瓶颈分析。我们解释了为什么 MARL延迟限制吞吐量需要在未来的文献中成为关键性能指标,同时我们也探讨了并行化和加速的机会。
https://arxiv.org/abs/2309.07108
The Internet of Medical Things (IoMT) is a platform that combines Internet of Things (IoT) technology with medical applications, enabling the realization of precision medicine, intelligent healthcare, and telemedicine in the era of digitalization and intelligence. However, the IoMT faces various challenges, including sustainable power supply, human adaptability of sensors and the intelligence of sensors. In this study, we designed a robust and intelligent IoMT system through the synergistic integration of flexible wearable triboelectric sensors and deep learning-assisted data analytics. We embedded four triboelectric sensors into a wristband to detect and analyze limb movements in patients suffering from Parkinson's Disease (PD). By further integrating deep learning-assisted data analytics, we actualized an intelligent healthcare monitoring system for the surveillance and interaction of PD patients, which includes location/trajectory tracking, heart monitoring and identity recognition. This innovative approach enabled us to accurately capture and scrutinize the subtle movements and fine motor of PD patients, thus providing insightful feedback and comprehensive assessment of the patients conditions. This monitoring system is cost-effective, easily fabricated, highly sensitive, and intelligent, consequently underscores the immense potential of human body sensing technology in a Health 4.0 society.
医疗物联网(IoMT)是一个平台,它将物联网技术(IoT)与医疗应用相结合,在数字和智能时代实现精准医疗、智能医疗和远程医疗。然而,IoMT面临着各种挑战,包括可持续电源供应、传感器人类的适应能力和传感器的智力。在本研究中,我们通过协同融合柔性可穿戴电荷传感器和深度学习协助数据分析,设计了一个稳健和智能的IoMT系统。我们嵌入了四个电荷传感器到手环上,以检测和分析患有帕金森病(PD)的患者的腿部运动。通过进一步融合深度学习协助数据分析,我们实现了一个智能医疗监控系统,用于监视和交互PD患者的观察和互动,包括位置/轨迹追踪、心脏监测和身份识别。这种创新的方法使我们能够精确捕捉和仔细分析PD患者的微妙运动和精细运动,从而提供深入反馈和全面评估患者的状况。这个监测系统具有成本效益高、易于制造、高度敏感和智能的特点,因此在健康4.0社会中,人体感知技术的巨大潜力被进一步强调。
https://arxiv.org/abs/2309.07185
Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.
嵌入式系统平台的创新性增强,特别是硬件加速,显著影响了深度学习在现实世界场景中的应用。这些创新将人类劳动 effort转化为用于各种领域的自动化智能系统,例如自动驾驶、机器人、物联网(IoT)和其他许多有影响力的应用。NVIDIA的Jetson平台是提供最优性能和效率的关键先驱之一,在深度学习算法执行中。此前,大多数基准分析是基于2D图像,每个比较结果都用一个深度学习模型进行。在本文中,我们实现了一种端到端视频犯罪现场异常检测系统,从监控视频输入,系统部署并完全运行在多个Jetson边缘设备(Nano、AGX Xavier、OrinNano)上。比较分析包括将Torch-TensorRT集成到NVIDIA的软件开发套件中,以优化模型性能。系统基于Facebook的PySlowfast开源项目作为编码模板构建。端到端系统流程包括从相机、数据预处理管道、特征提取器和异常检测的视频。我们提供了使用Docker技术部署基于AI的系统的实践经验,针对异常检测,我们应用了一种弱监督视频基于深度学习模型,称为 robust Temporal Feature Magnitude Learning(RTFM),它在该系统上实现了每秒47.56帧的推断速度,在一个只有3.11GB总内存使用的Jetson边缘设备上达到了。我们还发现了充满希望的Jetson设备,AI系统实现了比先前版本的Jetson设备15%更好的性能,同时消耗更少的能源功率。
https://arxiv.org/abs/2307.16834
Machine learning is at the center of mainstream technology and outperforms classical approaches to handcrafted feature design. Aside from its learning process for artificial feature extraction, it has an end-to-end paradigm from input to output, reaching outstandingly accurate results. However, security concerns about its robustness to malicious and imperceptible perturbations have drawn attention since its prediction can be changed entirely. Salient object detection is a research area where deep convolutional neural networks have proven effective but whose trustworthiness represents a significant issue requiring analysis and solutions to hackers' attacks. Brain programming is a kind of symbolic learning in the vein of good old-fashioned artificial intelligence. This work provides evidence that symbolic learning robustness is crucial in designing reliable visual attention systems since it can withstand even the most intense perturbations. We test this evolutionary computation methodology against several adversarial attacks and noise perturbations using standard databases and a real-world problem of a shorebird called the Snowy Plover portraying a visual attention task. We compare our methodology with five different deep learning approaches, proving that they do not match the symbolic paradigm regarding robustness. All neural networks suffer significant performance losses, while brain programming stands its ground and remains unaffected. Also, by studying the Snowy Plover, we remark on the importance of security in surveillance activities regarding wildlife protection and conservation.
机器学习是主流技术的中心,比 classical 方法对手工特征设计的实现更为有效。除了其人工特征提取的学习过程,它还具有从输入到输出的端到端范式,取得了非常准确的结果。然而,关于其对恶意和微扰的鲁棒性安全性问题引起了关注,因为其预测可以完全改变。引人注目的目标检测是一个研究领域,深卷积神经网络已经证明有效,但其可靠性代表了一个重要的问题,需要对黑客攻击进行分析和解决方案。脑编程是一种类似于传统人工智能的符号学习。这项工作提供了证据,表明符号学习的可靠性在设计可靠的视觉注意力系统时至关重要,因为它能够承受 even 最强烈的扰动。我们使用标准数据库和使用名为 Snowy Plover 的真实世界海鸟问题测试了这种进化计算方法,以对抗多个对抗攻击和噪声扰动。我们比较了我们的方法和五种不同的深度学习方法,证明了它们与可靠性符号范式不匹配。所有神经网络都遭受了严重的性能损失,而脑编程则坚如磐石,没有受到影响。此外,通过学习 Snowy Plover,我们注意到在野生动物保护和养护监测活动中安全性的重要性。
https://arxiv.org/abs/2309.05900
Most studies to date that have examined demographic variations in face recognition accuracy have analyzed 1-to-1 matching accuracy, using images that could be described as "government ID quality". This paper analyzes the accuracy of 1-to-many facial identification across demographic groups, and in the presence of blur and reduced resolution in the probe image as might occur in "surveillance camera quality" images. Cumulative match characteristic curves(CMC) are not appropriate for comparing propensity for rank-one recognition errors across demographics, and so we introduce three metrics for this: (1) d' metric between mated and non-mated score distributions, (2) absolute score difference between thresholds in the high-similarity tail of the non-mated and the low-similarity tail of the mated distribution, and (3) distribution of (mated - non-mated rank one scores) across the set of probe images. We find that demographic variation in 1-to-many accuracy does not entirely follow what has been observed in 1-to-1 matching accuracy. Also, different from 1-to-1 accuracy, demographic comparison of 1-to-many accuracy can be affected by different numbers of identities and images across demographics. Finally, we show that increased blur in the probe image, or reduced resolution of the face in the probe image, can significantly increase the false positive identification rate. And we show that the demographic variation in these high blur or low resolution conditions is much larger for male/ female than for African-American / Caucasian. The point that 1-to-many accuracy can potentially collapse in the context of processing "surveillance camera quality" probe images against a "government ID quality" gallery is an important one.
https://arxiv.org/abs/2309.04447
Several interesting problems in multi-robot systems can be cast in the framework of distributed optimization. Examples include multi-robot task allocation, vehicle routing, target protection and surveillance. While the theoretical analysis of distributed optimization algorithms has received significant attention, its application to cooperative robotics has not been investigated in detail. In this paper, we show how notable scenarios in cooperative robotics can be addressed by suitable distributed optimization setups. Specifically, after a brief introduction on the widely investigated consensus optimization (most suited for data analytics) and on the partition-based setup (matching the graph structure in the optimization), we focus on two distributed settings modeling several scenarios in cooperative robotics, i.e., the so-called constraint-coupled and aggregative optimization frameworks. For each one, we consider use-case applications, and we discuss tailored distributed algorithms with their convergence properties. Then, we revise state-of-the-art toolboxes allowing for the implementation of distributed schemes on real networks of robots without central coordinators. For each use case, we discuss their implementation in these toolboxes and provide simulations and real experiments on networks of heterogeneous robots.
在多机器人系统中,有几种有趣的问题可以将其纳入分布式优化的框架中。例如,多机器人任务分配、车辆路由、目标保护以及监测。虽然分布式优化算法的理论分析得到了广泛关注,但将其应用于合作机器人并未得到详细的研究。在本文中,我们将展示如何通过适当的分布式优化设置来解决合作机器人中的著名场景。具体来说,在介绍广泛研究的一致性优化(最适合数据分析)和基于分块的设置(在优化中匹配图结构)后,我们将专注于两个分布式设置,以建模几种合作机器人场景,即所谓的约束耦合和聚合优化框架。对于每个场景,我们考虑使用场景应用,并讨论定制的分布式算法及其收敛特性。然后,我们将更新最先进的工具箱,允许在机器人真实网络中实施分布式方案,而无需中央协调员。对于每个使用场景,我们讨论在这些工具箱中的实施情况,并提供机器人网络中异质机器人网络的模拟实验。
https://arxiv.org/abs/2309.04257
The use of Unmanned Aerial Vehicles (UAVs) is rapidly increasing in applications ranging from surveillance and first-aid missions to industrial automation involving cooperation with other machines or humans. To maximize area coverage and reduce mission latency, swarms of collaborating drones have become a significant research direction. However, this approach requires open challenges in positioning, mapping, and communications to be addressed. This work describes a distributed mapping system based on a swarm of nano-UAVs, characterized by a limited payload of 35 g and tightly constrained on-board sensing and computing capabilities. Each nano-UAV is equipped with four 64-pixel depth sensors that measure the relative distance to obstacles in four directions. The proposed system merges the information from the swarm and generates a coherent grid map without relying on any external infrastructure. The data fusion is performed using the iterative closest point algorithm and a graph-based simultaneous localization and mapping algorithm, running entirely on-board the UAV's low-power ARM Cortex-M microcontroller with just 192 kB of SRAM memory. Field results gathered in three different mazes from a swarm of up to 4 nano-UAVs prove a mapping accuracy of 12 cm and demonstrate that the mapping time is inversely proportional to the number of agents. The proposed framework scales linearly in terms of communication bandwidth and on-board computational complexity, supporting communication between up to 20 nano-UAVs and mapping of areas up to 180 m2 with the chosen configuration requiring only 50 kB of memory.
无人机的使用在监控和急救任务、工业自动化涉及与其他机器或人类的合作等方面正在迅速增加。为了最大化覆盖面并减少任务延迟,协作的无人机群已成为一个重要的研究方向。然而,这种方法需要解决定位、地图和通信方面的开放挑战。本文描述了基于一群纳米级的无人机的分布式地图系统,其特点是载荷有限,只有35克,并且内部感知和计算能力非常限制。每个纳米级无人机配备有四个64像素的深度传感器,用于测量四向障碍物的距离。 proposed 系统的工作原理是通过迭代最靠近点算法和基于Graph的同时定位和地图算法,完全内置于低功耗的ARM Cortex-M微控制器中,仅占用192KB的SRAM内存。从一群最多4个纳米级无人机的不同迷宫中收集的 field 结果显示,地图精度为12厘米,并证明地图时间与参与人数成反比。该提出的框架在通信带宽和内置计算复杂性方面呈线性增长,支持最多20个纳米级无人机之间的通信和覆盖面积达到180平方米,选择的配置只需要50KB的内存。
https://arxiv.org/abs/2309.03678
This paper presents a method for determining the area explored by a line-sweep sensor during an area-covering mission in a two-dimensional plane. Accurate knowledge of the explored area is crucial for various applications in robotics, such as mapping, surveillance, and coverage optimization. The proposed method leverages the concept of coverage measure of the environment and its relation to the topological degree in the plane, to estimate the extent of the explored region. In addition, we extend the approach to uncertain coverage measure values using interval analysis. This last contribution allows for a guaranteed characterization of the explored area, essential considering the often critical character of area-covering missions. Finally, this paper also proposes a novel algorithm for computing the topological degree in the 2-dimensional plane, for all the points inside an area of interest, which differs from existing solutions that compute the topological degree for single points. The applicability of the method is evaluated through a real-world experiment.
本论文提出了一种方法来确定在二维平面上的覆盖任务中通过线扫传感器探索的区域的面积。对于机器人应用中的各种应用,如地图、监视和区域覆盖优化,精确地了解探索区域是不可或缺的。该方法利用环境覆盖测量的概念及其与平面拓扑度数的关系,以估计探索区域的广度。此外,我们使用区间分析方法扩展了方法,以处理不确定的覆盖测量值。这一贡献提供了对探索区域的肯定特征描述,考虑到覆盖任务通常的重要性。最后,本文还提出了一种计算二维平面上拓扑度数的新算法,适用于所有感兴趣的区域内的点,与用于计算单个点的拓扑度数现有解决方案不同。方法的适用性通过实际实验进行评估。
https://arxiv.org/abs/2309.03604