Complementary to prevalent LiDAR and camera systems, millimeter-wave (mmWave) radar is robust to adverse weather conditions like fog, rainstorms, and blizzards but offers sparse point clouds. Current techniques enhance the point cloud by the supervision of LiDAR's data. However, high-performance LiDAR is notably expensive and is not commonly available on vehicles. This paper presents mmEMP, a supervised learning approach that enhances radar point clouds using a low-cost camera and an inertial measurement unit (IMU), enabling crowdsourcing training data from commercial vehicles. Bringing the visual-inertial (VI) supervision is challenging due to the spatial agnostic of dynamic objects. Moreover, spurious radar points from the curse of RF multipath make robots misunderstand the scene. mmEMP first devises a dynamic 3D reconstruction algorithm that restores the 3D positions of dynamic features. Then, we design a neural network that densifies radar data and eliminates spurious radar points. We build a new dataset in the real world. Extensive experiments show that mmEMP achieves competitive performance compared with the SOTA approach training by LiDAR's data. In addition, we use the enhanced point cloud to perform object detection, localization, and mapping to demonstrate mmEMP's effectiveness.
作为当前普遍的LiDAR和相机系统功能的补充,毫米波(mmWave)雷达对逆天气条件如雾、雷暴和暴风雪等具有很强的鲁棒性,但输出点云稀疏。目前的方法通过LiDAR数据的监督来增强点云。然而,高性能LiDAR价格昂贵,在车辆上并不常见。本文提出了一种低成本相机和惯性测量单元(IMU)协同工作,通过自适应波束形成技术增强雷达点云,实现从商用车辆的大众培训数据。由于动态对象的局域性,视觉-惯性(VI)监督带来挑战。此外,来自RF多径污染的伪雷达点使机器人误解场景。mmEMP首先设计了一个动态3D重建算法,恢复了动态特征的3D位置。然后,我们设计了一个神经网络,通过增加雷达数据密度和消除伪雷达点来加强雷达数据。我们在现实世界中构建了一个新数据集。大量实验证明,与通过LiDAR数据训练的当前最佳方法相比,mmEMP具有竞争力的性能。此外,我们使用增强后的点云进行目标检测、定位和映射,以展示mmEMP的有效性。
https://arxiv.org/abs/2404.17229
Autonomous Unmanned Aerial Vehicles (UAVs) have become essential tools in defense, law enforcement, disaster response, and product delivery. These autonomous navigation systems require a wireless communication network, and of late are deep learning based. In critical scenarios such as border protection or disaster response, ensuring the secure navigation of autonomous UAVs is paramount. But, these autonomous UAVs are susceptible to adversarial attacks through the communication network or the deep learning models - eavesdropping / man-in-the-middle / membership inference / reconstruction. To address this susceptibility, we propose an innovative approach that combines Reinforcement Learning (RL) and Fully Homomorphic Encryption (FHE) for secure autonomous UAV navigation. This end-to-end secure framework is designed for real-time video feeds captured by UAV cameras and utilizes FHE to perform inference on encrypted input images. While FHE allows computations on encrypted data, certain computational operators are yet to be implemented. Convolutional neural networks, fully connected neural networks, activation functions and OpenAI Gym Library are meticulously adapted to the FHE domain to enable encrypted data processing. We demonstrate the efficacy of our proposed approach through extensive experimentation. Our proposed approach ensures security and privacy in autonomous UAV navigation with negligible loss in performance.
自主无人机(UAVs)已成为军事、执法、灾难应对和产品交付等领域的不可或缺的工具。这些自主导航系统需要无线通信网络,并且最近基于深度学习。在危机场景(如边境保护或灾难应对)中,确保自主UAV的安全导航至关重要。但是,这些自主UAV通过通信网络或深度学习模型容易受到攻击 - 窃听 / 中间人攻击 / 成员推断 / 重建。为了应对这种易受攻击性,我们提出了结合强化学习(RL)和完全同态加密(FHE)的安全自主UAV导航的创新方法。这个端到端的安全框架是为由UAV相机捕获的实时视频信号设计的,并利用FHE对加密输入图像进行推理。虽然FHE允许对加密数据进行计算,但某些计算操作尚未实现。将卷积神经网络、全连接神经网络、激活函数和OpenAI Gym库细粒度地适应FHE领域,以实现加密数据处理。我们通过广泛的实验来证明我们提出的方法的有效性。我们提出的方法确保了在自主UAV导航中保护安全和隐私,同时性能损失非常小。
https://arxiv.org/abs/2404.17225
Simultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building interiors separated to single rooms, shop floors or construction areas require measures at larger distances in potentially texture less areas under difficult illumination. Pose estimation is further aggravated since no GNSS measures are available as it is usual for such indoor applications. In our work, we realize data collection in a large factory hall by a robot system equipped with four stereo cameras as well as a 3D laser scanner. We apply our state-of-the-art LiDAR and visual SLAM approaches and discuss the respective pros and cons of the different sensor types for trajectory estimation and dense map generation in such an environment. Additionally, dense and accurate depth maps are generated by 3D Gaussian splatting, which we plan to use in the context of our project aiming on the automatic construction and site monitoring.
同时定位与映射(SLAM),即通过(3D)地图重建环境,同时进行姿态估计,已经取得了令人惊讶的进展。与此同时,旨在在复杂环境中进行数据收集的大规模应用变得更加可行。然而,与单间建筑内部的小规模场景相比,车间或建筑区需要在大规模距离内测量在可能纹理不足的区域上的措施。由于通常没有GNSS测量方法,因此对于这种室内应用,姿态估计进一步加剧。在我们的工作中,我们通过配备四个立体摄像头和3D激光扫描仪的机器人系统在大型工厂车间中实现了数据收集。我们应用了最先进的LiDAR和视觉SLAM方法,并讨论了不同传感器类型对于轨迹估计和密集地图生成的优缺点。此外,通过3D高斯扩展生成密集且准确的深度图,我们计划在致力于自动建筑和场地监测的项目中使用。
https://arxiv.org/abs/2404.17215
Recently, fiber optic sensors such as fiber Bragg gratings (FBGs) have been widely investigated for shape reconstruction and force estimation of flexible surgical robots. However, most existing approaches need precise model parameters of FBGs inside the fiber and their alignments with the flexible robots for accurate sensing results. Another challenge lies in online acquiring external forces at arbitrary locations along the flexible robots, which is highly required when with large deflections in robotic surgery. In this paper, we propose a novel data-driven paradigm for simultaneous estimation of shape and force along highly deformable flexible robots by using sparse strain measurement from a single-core FBG fiber. A thin-walled soft sensing tube helically embedded with FBG sensors is designed for a robotic-assisted flexible ureteroscope with large deflection up to 270 degrees and a bend radius under 10 mm. We introduce and study three learning models by incorporating spatial strain encoders, and compare their performances in both free space and constrained environments with contact forces at different locations. The experimental results in terms of dynamic shape-force sensing accuracy demonstrate the effectiveness and superiority of the proposed methods.
近年来,光纤传感器(如光纤布拉格光栅)已广泛应用于柔性手术机器人的形状重建和力估计。然而,大多数现有方法需要光纤内FBGs的准确模型参数和其与柔性机器人的对齐,以实现准确的感测结果。另一个挑战是在柔性机器人上在线获取任意位置的外力,这在机器人手术中在大角度摆动时非常重要。在本文中,我们提出了一种通过单纤光纤的稀疏应变测量同时估计形状和力量的新型数据驱动方法。设计了一种由FBG传感器缠绕而成的薄壁软感测管,用于具有高达270度的弯曲和不到10毫米的弯曲半径的大弯曲柔性内窥镜。我们引入并研究了三种学习模型,通过集成空间应变编码器,在自由空间和约束环境中比较它们的性能。关于动态形状-力感测精度的实验结果表明,所提出的方法的有效性和优越性得到了充分证明。
https://arxiv.org/abs/2404.16952
The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral mapping ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the mapping process into three sub-tasks, including NIR texture maintenance, coarse geometry reconstruction, and RGB color prediction. Thus, we propose three key modules for each corresponding sub-task: the Texture Preserving Block (TPB), the HSV Color Feature Embedding Module (HSV-CFEM), and the Geometry Reconstruction Module (GRM). These modules contribute to our MCFNet methodically tackling spectral translation through a series of escalating resolutions, progressively enriching images with color and texture fidelity in a scale-coherent fashion. The proposed MCFNet demonstrates substantial performance gains over the NIR image colorization task. Code is released at: this https URL.
NIR-to-RGB spectral domain translation是一个具有挑战性的任务,因为NIR输入和RGB输出的固有光谱映射歧义。因此,现有的方法无法在保持纹理细节保真度和实现多样色彩变化之间实现和谐。在本文中,我们提出了一种多尺度HSV颜色特征嵌入网络(MCFNet),将映射过程分解为包括NIR纹理维护、粗几何重建和RGB颜色预测三个子任务的三个子任务。因此,我们提出了每个相应子任务的关键模块:纹理保留模块(TPB)、HSV颜色特征嵌入模块(HSV-CFEM)和几何重建模块(GRM)。这些模块通过一系列逐渐升高的分辨率,以尺度和谐的方式贡献于我们的MCFNet方法,通过一系列自适应纹理映射,实现对NIR图像颜色化的巨大性能提升。所提出的MCFNet在NIR图像颜色化任务中取得了显著的性能提升。代码发布在:https://这个URL。
https://arxiv.org/abs/2404.16685
While neural implicit representations have gained popularity in multi-view 3D reconstruction, previous work struggles to yield physically plausible results, thereby limiting their applications in physics-demanding domains like embodied AI and robotics. The lack of plausibility originates from both the absence of physics modeling in the existing pipeline and their inability to recover intricate geometrical structures. In this paper, we introduce PhyRecon, which stands as the first approach to harness both differentiable rendering and differentiable physics simulation to learn implicit surface representations. Our framework proposes a novel differentiable particle-based physical simulator seamlessly integrated with the neural implicit representation. At its core is an efficient transformation between SDF-based implicit representation and explicit surface points by our proposed algorithm, Surface Points Marching Cubes (SP-MC), enabling differentiable learning with both rendering and physical losses. Moreover, we model both rendering and physical uncertainty to identify and compensate for the inconsistent and inaccurate monocular geometric priors. The physical uncertainty additionally enables a physics-guided pixel sampling to enhance the learning of slender structures. By amalgamating these techniques, our model facilitates efficient joint modeling with appearance, geometry, and physics. Extensive experiments demonstrate that PhyRecon significantly outperforms all state-of-the-art methods in terms of reconstruction quality. Our reconstruction results also yield superior physical stability, verified by Isaac Gym, with at least a 40% improvement across all datasets, opening broader avenues for future physics-based applications.
虽然多视角3D重建中神经隐式表示已经获得了越来越多的关注,但之前的 work 很难产生物理上合理的成果,从而限制了它们在需要物理要求的领域(如 embodied AI 和机器人学)的应用。缺乏可信度源于现有流程中缺少物理建模以及它们无法恢复复杂的几何结构。在本文中,我们引入了 PhyRecon,这是第一个利用可导渲染和可导物理仿真来学习隐式表面表示的方法。我们的框架将新颖的可导粒子基于物理仿真与神经隐式表示无缝集成。其核心是基于我们提出的表面点前进立方(SP-MC)算法在 SDF 基于隐式表示和显式表面点之间进行有效的转换,实现基于渲染和物理损失的可导学习。此外,我们还建模了渲染和物理不确定性以识别和弥补不一致和不准确的单目几何先验。物理不确定性还允许我们进行基于物理的像素采样,以增强对细长结构的学习。通过将这些技术相结合,我们的模型实现了与外观、几何和物理的效率共生建模。大量实验证明,PhyRecon 在重建质量方面显著超过了所有现有方法。我们的重建结果还证明了伊萨·格雷戈尔(Isaac Gym)验证的卓越物理稳定性,在所有数据集上实现了至少 40% 的改进,为未来的基于物理的应用于开辟了更广泛的道路。
https://arxiv.org/abs/2404.16666
Efficient visual perception using mobile systems is crucial, particularly in unknown environments such as search and rescue operations, where swift and comprehensive perception of objects of interest is essential. In such real-world applications, objects of interest are often situated in complex environments, making the selection of the 'Next Best' view based solely on maximizing visibility gain suboptimal. Semantics, providing a higher-level interpretation of perception, should significantly contribute to the selection of the next viewpoint for various perception tasks. In this study, we formulate a novel information gain that integrates both visibility gain and semantic gain in a unified form to select the semantic-aware Next-Best-View. Additionally, we design an adaptive strategy with termination criterion to support a two-stage search-and-acquisition manoeuvre on multiple objects of interest aided by a multi-degree-of-freedoms (Multi-DoFs) mobile system. Several semantically relevant reconstruction metrics, including perspective directivity and region of interest (ROI)-to-full reconstruction volume ratio, are introduced to evaluate the performance of the proposed approach. Simulation experiments demonstrate the advantages of the proposed approach over existing methods, achieving improvements of up to 27.13% for the ROI-to-full reconstruction volume ratio and a 0.88234 average perspective directivity. Furthermore, the planned motion trajectory exhibits better perceiving coverage toward the target.
高效的移动设备视觉感知对于移动系统来说至关重要,尤其是在未知环境中,如搜索和救援行动,快速全面地感知感兴趣的对象是至关重要的。在这类现实应用中,感兴趣的对象通常位于复杂的环境中,因此仅基于最大化可见性增益来选择下一个视角是不够的。语义信息,提供对感知的高级解释,应该对各种感知任务的下一个视角选择产生显著影响。在本研究中,我们提出了一种新颖的信息增益形式,将可见性增益和语义增益统一在一起,以选择具有语义意识的下一个最好的视角。此外,我们设计了一个支持多自由度(Multi-DoFs)移动系统两个阶段搜索与获取动作的适应策略。为了评估所提出方法的表现,我们引入了一些语义相关的重构指标,包括视点定向和感兴趣区域(ROI)到完整重建体积比。仿真实验证明,与现有方法相比,所提出的方法具有明显的优势,ROI-到完整重建体积比的提升幅度达到27.13%,平均视点定向为0.88234。此外,计划运动轨迹对目标区域的感知覆盖面更好。
https://arxiv.org/abs/2404.16507
Recent advancements in self-supervised learning in the point cloud domain have demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for point cloud data. To this end, we introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens proximity based on their indices during target and context selection. The sequencer also allows shared computations of the tokens proximity between context and target selection, further improving the efficiency. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality.
近年来,在点云领域自监督学习的进展已经展示了很大的潜力。然而,这些方法通常存在缺点,包括漫长的预训练时间、在输入空间进行重建的必要性,或者需要额外的模块。为了应对这些问题,我们引入了点JEPA,一种专门针对点云数据的联合嵌入预测架构。为此,我们引入一个序列器,对点云令牌进行排序,以在目标和上下文选择期间基于其索引计算并利用令牌的接近性。序列器还允许在上下文和目标选择之间共享计算令牌接近性,从而进一步提高效率。实验证明,我们的方法在获得与最先进方法竞争力的结果的同时,避免了在输入空间进行重建或添加额外模块。
https://arxiv.org/abs/2404.16432
While originally developed for novel view synthesis, Neural Radiance Fields (NeRFs) have recently emerged as an alternative to multi-view stereo (MVS). Triggered by a manifold of research activities, promising results have been gained especially for texture-less, transparent, and reflecting surfaces, while such scenarios remain challenging for traditional MVS-based approaches. However, most of these investigations focus on close-range scenarios, with studies for airborne scenarios still missing. For this task, NeRFs face potential difficulties at areas of low image redundancy and weak data evidence, as often found in street canyons, facades or building shadows. Furthermore, training such networks is computationally expensive. Thus, the aim of our work is twofold: First, we investigate the applicability of NeRFs for aerial image blocks representing different characteristics like nadir-only, oblique and high-resolution imagery. Second, during these investigations we demonstrate the benefit of integrating depth priors from tie-point measures, which are provided during presupposed Bundle Block Adjustment. Our work is based on the state-of-the-art framework VolSDF, which models 3D scenes by signed distance functions (SDFs), since this is more applicable for surface reconstruction compared to the standard volumetric representation in vanilla NeRFs. For evaluation, the NeRF-based reconstructions are compared to results of a publicly available benchmark dataset for airborne images.
虽然最初是为 novel view synthesis 设计的,但近年来 Neural Radiance Fields (NeRFs) 已经作为一种多视图立体 (MVS) 的替代方案得到了广泛应用。受到多种研究活动的触发,尤其是在缺乏纹理、透明和反射表面的情况下,NeRFs 的表现尤为出色,而传统 MVS 方法在这些问题上仍然具有挑战性。然而,这些研究主要集中在近景场景,尽管已经对空气场景进行了研究,但仍有缺失。对于这项任务,NeRFs 在低图像冗余和弱数据证据的领域可能会面临潜在的困难,正如在街巷、建筑立面或建筑物阴影中常见的情况。此外,训练这类网络在计算上较为昂贵。因此,我们工作的目标是双重的:首先,我们研究 NeRFs 在代表不同特性的航空图像块上的适用性;其次,在這些調查期間,我們將展示將來自點測量學的深度 prior 整合到預假 Bundle Block Adjustment 中的好處。我们的工作基於最先进的框架 VolSDF,它通過點距函數 (SDF) 建模 3D 场景,因為這比標準的 NeRFs 的表面重建更適合作用。對於評估,我們將 NeRF 基於的重建與空氣中可獲得的公开數據集的結果進行比較。
https://arxiv.org/abs/2404.16429
In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, allowing efficient and effective decoding through 3D reference point and multi-layer refinement adaptations. By harnessing the benefits of 3D Gaussians, our approach offers an efficient and accurate solution for 3D reconstruction from single-view images. We evaluate our method on the ShapeNet SRN dataset, getting PSNR of 24.21 and 24.98 in car and chair dataset, respectively. The result outperforming the recent method by around 2.25%, demonstrating the effectiveness of our method in achieving superior results.
在本文中,我们研究了从单视RGB图像中进行3D重建的问题,并提出了名为DIG3D的三维物体重建和新颖视图合成方法。我们的方法利用了一个编码器-解码器框架,在编码器的指导下生成3D高斯分布。特别地,我们引入了形变Transformer,通过3D参考点和多层精细修复适应来实现高效的解码。通过利用3D高斯分布的优势,我们的方法为从单视图像中进行3D重建提供了有效且准确的方法。我们在ShapeNet SRN数据集上评估我们的方法,得到汽车和椅子数据集的PSNR分别为24.21和24.98。该结果比最近的方法约领先2.25%,证明了我们在实现卓越结果方面的有效性。
https://arxiv.org/abs/2404.16323
Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable of producing spatially-varying noise blends despite not having access to such data for training. These features are enabled by training a denoising diffusion model using a novel combination of data augmentation and network conditioning techniques. Like procedural noise generators, the model's behavior is controllable via interpretable parameters and a source of randomness. We use our model to produce a variety of visually compelling noise textures. We also present an application of our model to improving inverse procedural material design; using our model in place of fixed-type noise nodes in a procedural material graph results in higher-fidelity material reconstructions without needing to know the type of noise in advance.
过程噪声是计算机图形流水线的一个基本组成部分,提供了一种生成具有“自然”随机变异的纹理的方式。有许多不同类型的噪声,每个都是由独立的算法生成的。在本文中,我们提出了一个单生成模型,可以学习生成多种类型的噪声以及将它们混合。此外,它还能够在没有访问到这种数据进行训练的情况下生成空间随机的噪声混合。这些特点是由使用新型的数据增强和网络调节技术训练去噪扩散模型而实现的。与过程噪声生成器类似,模型的行为可以通过可解释的参数和随机性的来源进行控制。我们使用我们的模型来生成各种视觉上令人印象深刻的噪声纹理。我们还介绍了一个使用我们模型的改进反向过程材料设计的应用;将我们的模型代替固定类型噪声节点在图状材料图中,可以实现更高保真的材料重构,而无需提前知道噪声的类型。
https://arxiv.org/abs/2404.16292
Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research based on LSHADE has also been widely studied by researchers. However, although recently proposed improvement strategies have shown superiority over their previous generation's first performance, adding all new strategies may not necessarily bring the strongest performance. Therefore, we recombine some effective advances based on advanced differential evolution variants in recent years and finally determine an effective combination scheme to further promote the performance of differential evolution. In this paper, we propose a strategy recombination and reconstruction differential evolution algorithm called reconstructed differential evolution (RDE) to solve single-objective bounded optimization problems. Based on the benchmark suite of the 2024 IEEE Congress on Evolutionary Computation (CEC2024), we tested RDE and several other advanced differential evolution variants. The experimental results show that RDE has superior performance in solving complex optimization problems.
复杂单个目标有界问题往往很难解决。在进化计算方法中,自1997年差别进化算法(Differential Evolution,DE)的提出以来,因为它简单而高效,所以得到了广泛研究和开发。这些发展包括各种自适应策略、操作器改进以及引入其他搜索方法。自2014年以来,基于LSHADE的研究也得到了广泛研究。然而,尽管最近提出的改进策略在以前一代中的首次表现上显示出优越性,但添加所有新的策略不一定会带来最强的性能。因此,我们根据近年来基于先进差别进化变体的有效进展重新组合了一些策略,最后确定了一种有效的组合方案,进一步促进差别进化算法的性能。在本文中,我们提出了一个名为重构差别进化(RDE)的策略重组和重构差别进化算法,用于解决单个目标有界优化问题。基于2024年IEEE Congress on Evolutionary Computation(CEC2024)的基准集,我们测试了RDE以及其他几种高级差别进化算法。实验结果表明,RDE在解决复杂优化问题方面具有优越性能。
https://arxiv.org/abs/2404.16280
We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area.
我们提出了NeRF-XL,一种在多个GPU之间有原则地分配Neural Radiance场(NeRFs)的方法,从而实现使用任意大的容量训练和渲染NeRFs。我们首先回顾现有的多GPU方法,这些方法将大场景分解为多个独立训练的NeRFs,并确定这些方法在训练过程中存在几个基本问题,这些问题会随着使用额外的计算资源(GPUs)而有所改善。NeRF-XL解决了这些问题,通过简单地使用更多硬件来训练和渲染NeRFs,实现了NeRFs具有任意数量参数的训练和渲染。 我们方法的核心是一个新的分布式训练和渲染公式,它与经典单GPU情况等价,并最小化了GPU之间的通信。通过解锁具有任意大参数计数的NeRFs,我们的方法揭示了多GPU对NeRFs的扩展规模定律,表明随着参数计数的大幅增加,重建质量的提高以及速度的提高。我们在各种数据集上都证明了NeRF-XL的有效性,包括迄今最大的开源数据集MatrixCity,该数据集包含258K个图像,覆盖了25平方公里的城市区域。
https://arxiv.org/abs/2404.16221
Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We introduce the Gated Sparse Autoencoder (Gated SAE), which achieves a Pareto improvement over training with prevailing methods. In SAEs, the L1 penalty used to encourage sparsity introduces many undesirable biases, such as shrinkage -- systematic underestimation of feature activations. The key insight of Gated SAEs is to separate the functionality of (a) determining which directions to use and (b) estimating the magnitudes of those directions: this enables us to apply the L1 penalty only to the former, limiting the scope of undesirable side effects. Through training SAEs on LMs of up to 7B parameters we find that, in typical hyper-parameter ranges, Gated SAEs solve shrinkage, are similarly interpretable, and require half as many firing features to achieve comparable reconstruction fidelity.
最近的工作发现,稀疏自动编码器(SAEs)是发现自然语言模型(LMs)激活的有用特征的有效技术,通过找到稀疏、线性的LM激活的稀疏重构。我们引入了门控稀疏自动编码器(Gated SSAE),它比现有的训练方法实现了帕累托改进。在SAE中,用于鼓励稀疏性的L1惩罚引入了许多不利的偏差,例如收缩——系统性地低估特征激活。Gated SSAE的关键洞察力是分离确定使用方向的功能和估计这些方向的大小:这使我们能够仅对前者应用L1惩罚,从而限制了不良影响范围。通过在具有7B参数的LM上训练SAEs,我们发现,在典型的超参数范围内,Gated SSAE解决了收缩,具有与SAEs相似的可解释性,并且需要一半的 firing特征才能实现与同质重构的相当的重建保真度。
https://arxiv.org/abs/2404.16014
Recent advancements in 3D reconstruction technologies have paved the way for high-quality and real-time rendering of complex 3D scenes. Despite these achievements, a notable challenge persists: it is difficult to precisely reconstruct specific objects from large scenes. Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views. To address this challenge, we delve into the meticulous 3D reconstruction of specific objects within large scenes and propose a framework termed OMEGAS: Object Mesh Extraction from Large Scenes Guided by GAussian Segmentation. OMEGAS employs a multi-step approach, grounded in several excellent off-the-shelf methodologies. Specifically, initially, we utilize the Segment Anything Model (SAM) to guide the segmentation of 3D Gaussian Splatting (3DGS), thereby creating a basic 3DGS model of the target object. Then, we leverage large-scale diffusion priors to further refine the details of the 3DGS model, especially aimed at addressing invisible or occluded object portions from the original scene views. Subsequently, by re-rendering the 3DGS model onto the scene views, we achieve accurate object segmentation and effectively remove the background. Finally, these target-only images are used to improve the 3DGS model further and extract the definitive 3D object mesh by the SuGaR model. In various scenarios, our experiments demonstrate that OMEGAS significantly surpasses existing scene reconstruction methods. Our project page is at: this https URL
近年来,三维重建技术的发展为高质量和实时渲染复杂的3D场景奠定了基础。然而,一个显著的挑战仍然存在:很难精确从大型场景中重建特定对象。当前的场景重建技术通常会导致丢失物体细节纹理,并且无法从视图中重建被遮挡或未见到的物体部分。为了应对这个挑战,我们深入研究了大场景中具体对象的3D重建,并提出了一个名为OMEGAS的框架:基于Gaussian分割的大型场景引导对象网格提取。OMEGAS采用了一种多步骤方法,基于几种出色的非处方方法。具体来说,首先,我们使用SAM引导3D高斯平铺(3DGS)的分割,从而创建了目标对象的初步3DGS模型。然后,我们利用大型扩散 prior进一步优化3DGS模型的细节,特别关注解决原始场景视图中看不见或被遮挡的物体部分。接下来,通过将3DGS模型重新渲染到场景视图中,我们实现了精确的物体分割,并有效地去除了背景。最后,这些目标仅图像被用于进一步改进3DGS模型,并使用SuGaR模型提取了最终的3D物体网格。在各种场景中,我们的实验表明,OMEGAS显著超越了现有的场景重建方法。我们的项目页面是:https:// this URL
https://arxiv.org/abs/2404.15891
Patellofemoral joint (PFJ) issues affect one in four people, with 20% experiencing chronic knee pain despite treatment. Poor outcomes and pain after knee replacement surgery are often linked to patellar mal-tracking. Traditional imaging methods like CT and MRI face challenges, including cost and metal artefacts, and there's currently no ideal way to observe joint motion without issues such as soft tissue artefacts or radiation exposure. A new system to monitor joint motion could significantly improve understanding of PFJ dynamics, aiding in better patient care and outcomes. Combining 2D ultrasound with motion tracking for 3D reconstruction of the joint using semantic segmentation and position registration can be a solution. However, the need for expensive external infrastructure to estimate the trajectories of the scanner remains the main limitation to implementing 3D bone reconstruction from handheld ultrasound scanning clinically. We proposed the Visual-Inertial Odometry (VIO) and the deep learning-based inertial-only odometry methods as alternatives to motion capture for tracking a handheld ultrasound scanner. The 3D reconstruction generated by these methods has demonstrated potential for assessing the PFJ and for further measurements from free-hand ultrasound scans. The results show that the VIO method performs as well as the motion capture method, with average reconstruction errors of 1.25 mm and 1.21 mm, respectively. The VIO method is the first infrastructure-free method for 3D reconstruction of bone from wireless handheld ultrasound scanning with an accuracy comparable to methods that require external infrastructure.
翻译:Patellofemoral joint (PFJ) 问题影响四分之一的人,即使经过治疗,20%的人仍然会经历慢性膝盖疼痛。腿部置换手术后的不良结果和疼痛通常与膝关节不良运动有关。传统的影像技术如 CT 和 MRI 面临成本和金属伪影等挑战,目前没有理想的方法在没有软组织伪影或辐射暴露等问题的情况下观察关节运动。一种新系统监测关节运动可能显著改善对 PFJ 动态的理解,有助于提高患者护理和治疗效果。将 2D 超声与运动跟踪结合进行关节三维重建可以使用语义分割和位置配准,可能是解决方案。然而,需要昂贵的外部基础设施估计扫描器的轨迹仍然是实施临床超声三维骨重建的主要限制。我们提出了视觉惯性测量 (VIO) 和基于深度学习的惯性仅运动跟踪方法作为手持超声扫描器的运动捕捉替代方法。这些方法产生的 3D 重建已经证明了评估 PFJ 的潜力和从自由手超声扫描中进行进一步测量的可能性。结果表明,VIO 方法与运动捕捉方法的表现相同,平均重建误差分别为 1.25 mm 和 1.21 mm。VIO 方法是第一个无基础设施免费的 3D 骨重建方法,其准确性相当于需要外部基础设施的方法。
https://arxiv.org/abs/2404.15847
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.
本文讨论了从微型卫星中恢复超光谱图像(HSI)面临的挑战,这些卫星通常受到条带效应的影响,并且计算资源有限。我们提出了一个轻量级的实时压缩感知(RTCS)网络,旨在实现高效且在条带效应和噪声传输条件下具有鲁棒性的HSI重构。RTCS网络具有简化架构,减少了所需的训练样本,并使条带型HSI的压缩感知变得容易,与迷你火箭扫描机制上的中等设计完全吻合。这 contrasts 基于优化的模型,这些模型需要高精度的浮点运算,使得它们难以在边缘设备上部署。我们的编码器采用了一个兼容整数8的线性投影来传输条带型HSI数据,实现实时压缩感知。此外,根据新颖的双流架构,在接收端提出了一种高效的HSI恢复解码器,允许在没有复杂中央服务器的情况下实现边缘设备重建。随着越来越多的微型卫星需要地面站的大量计算资源,这种方法的重要性也越来越突出。大量实验验证了我们的方法的优越性能,为现有的微型卫星系统提供了新的和至关重要的功能。
https://arxiv.org/abs/2404.15781
Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range along with unknown lighting details, and 2) the expensive computational cost in volume rendering to backtrace the paths leading to final object colors. We present a novel approach, ESR-NeRF, leveraging neural networks as learnable functions to represent ray-traced fields. By training networks to satisfy light transport segments, we regulate outgoing radiances, progressively identifying emissive sources while being aware of reflection areas. The results on scenes encompassing emissive sources with various properties demonstrate the superiority of ESR-NeRF in qualitative and quantitative ways. Our approach also extends its applicability to the scenes devoid of emissive sources, achieving lower CD metrics on the DTU dataset.
现有的基于NeRF的反向渲染方法假定场景是由远距离光源独家照明,而忽略了场景内潜在的发射源影响。在本文中,我们通过开启和关闭发射源的LDR多视角图像来应对这一局限。需要解决两个关键问题:1)由于动态范围有限和未知的光线细节而产生的模糊;2)在体积渲染中,为了追溯导致最终物体颜色的路径而产生的昂贵计算成本。我们提出了ESR-NeRF,一种利用神经网络作为可学习函数来表示光迹场的全新方法。通过训练网络满足光传输段,我们调节出射辐射,在意识到反射区域的同时,逐渐确定发射源。ESR-NeRF对具有各种属性的发射源场景的性能在质量和数量上都有所改进。我们的方法还将其应用扩展到没有发射源的场景中,在DTU数据集上的CD指标较低。
https://arxiv.org/abs/2404.15707
Proto-form reconstruction has been a painstaking process for linguists. Recently, computational models such as RNN and Transformers have been proposed to automate this process. We take three different approaches to improve upon previous methods, including data augmentation to recover missing reflexes, adding a VAE structure to the Transformer model for proto-to-language prediction, and using a neural machine translation model for the reconstruction task. We find that with the additional VAE structure, the Transformer model has a better performance on the WikiHan dataset, and the data augmentation step stabilizes the training.
原型重建一直是语言学家们痛苦的过程。最近,提出了使用诸如RNN和Transformer这样的计算模型来自动化这一过程。我们采用了三种不同的方法来改进以前的方法,包括数据增强来恢复缺失的反射,在Transformer模型中添加VAE结构来进行原型到语言预测,以及使用神经机器翻译模型来进行重构任务。我们发现,在添加了VAE结构之后,Transformer模型的WikiHan数据集的表现更好,数据增强步骤使训练趋于稳定。
https://arxiv.org/abs/2404.15690
Due to the rapid spread of rumors on social media, rumor detection has become an extremely important challenge. Recently, numerous rumor detection models which utilize textual information and the propagation structure of events have been proposed. However, these methods overlook the importance of semantic evolvement information of event in propagation process, which is often challenging to be truly learned in supervised training paradigms and traditional rumor detection methods. To address this issue, we propose a novel semantic evolvement enhanced Graph Autoencoder for Rumor Detection (GARD) model in this paper. The model learns semantic evolvement information of events by capturing local semantic changes and global semantic evolvement information through specific graph autoencoder and reconstruction strategies. By combining semantic evolvement information and propagation structure information, the model achieves a comprehensive understanding of event propagation and perform accurate and robust detection, while also detecting rumors earlier by capturing semantic evolvement information in the early stages. Moreover, in order to enhance the model's ability to learn the distinct patterns of rumors and non-rumors, we introduce a uniformity regularizer to further improve the model's performance. Experimental results on three public benchmark datasets confirm the superiority of our GARD method over the state-of-the-art approaches in both overall performance and early rumor detection.
由于社交媒体上谣言的迅速传播,谣言检测已成为一个非常具有挑战性的任务。最近,许多利用文本信息和事件传播结构提出了很多谣言检测模型。然而,这些方法忽视了在传播过程中事件语义演变信息的重要性,而这种信息在监督训练范式和传统谣言检测方法中通常很难真正学习。为解决这个问题,我们提出了一个新颖的半监督演化增强图卷积神经网络(GARD)谣言检测模型,本文对其进行了阐述。该模型通过捕获局部语义变化和全局语义演化信息来学习事件语义演变信息,通过特殊的图卷积神经网络和重构策略获得全局语义演化信息。通过结合语义演变信息和传播结构信息,模型获得了对事件传播的全面理解,并能够准确和可靠地检测谣言,同时还能在谣言传播初期通过捕获语义演变信息来检测谣言。此外,为了增强模型学习不同谣言和非谣言的独特模式的能力,我们引入了均匀性正则化进一步改进了模型的性能。在三个公开基准数据集上的实验结果证实了我们在总体表现和早期谣言检测方面超过了最先进方法的优越性。
https://arxiv.org/abs/2404.16076