This article analyzes the use of two parallel multi-objective soft computing algorithms to automatically search for high-quality settings of the Ad hoc On Demand Vector routing protocol for vehicular networks. These methods are based on an evolutionary algorithm and on a swarm intelligence approach. The experimental analysis demonstrates that the configurations computed by our optimization algorithms outperform other state-of-the-art optimized ones. In turn, the computational efficiency achieved by all the parallel versions is greater than 87 %. Therefore, the line of work presented in this article represents an efficient framework to improve vehicular communications.
本文分析了使用两种并行多目标软计算算法自动搜索适用于车载网络的自组织需求矢量路由协议(Ad hoc On Demand Vector routing protocol)的高质量设置的方法。这些方法基于进化算法和群体智能方法。实验分析表明,我们优化算法计算出的配置优于其他最先进的优化方案。同时,并行版本所实现的计算效率均超过了87%。因此,本文提出的研究工作代表了一种提高车载通信效率的有效框架。
https://arxiv.org/abs/2501.09725
This study conducts a systematic assessment of the capabilities of 12 machine learning models and model variations in detecting economic ideology. As an evaluation benchmark, I use manifesto data spanning six elections in the United Kingdom and pre-annotated by expert and crowd coders. The analysis assesses the performance of several generative, fine-tuned, and zero-shot models at the granular and aggregate levels. The results show that generative models such as GPT-4o and Gemini 1.5 Flash consistently outperform other models against all benchmarks. However, they pose issues of accessibility and resource availability. Fine-tuning yielded competitive performance and offers a reliable alternative through domain-specific optimization. But its dependency on training data severely limits scalability. Zero-shot models consistently face difficulties with identifying signals of economic ideology, often resulting in negative associations with human coding. Using general knowledge for the domain-specific task of ideology scaling proved to be unreliable. Other key findings include considerable within-party variation, fine-tuning benefiting from larger training data, and zero-shot's sensitivity to prompt content. The assessments include the strengths and limitations of each model and derive best-practices for automated analyses of political content.
这项研究对12种机器学习模型及其变体在检测经济意识形态方面的能力进行了系统的评估。作为评价标准,我使用了跨越英国六次选举的宣言数据,并由专家和众包编码者预先标注。该分析评估了几种生成式、微调型和零样本模型在颗粒级和汇总级上的表现。 研究结果表明,像GPT-4o和Gemini 1.5 Flash这样的生成式模型,在所有基准测试中都持续优于其他模型。然而,这些模型面临可访问性和资源可用性的问题。微调可以产生具有竞争力的性能,并通过特定领域的优化提供可靠的替代方案。但是,它对训练数据的依赖严重限制了其扩展能力。零样本模型在识别经济意识形态信号方面经常遇到困难,往往与人类编码的结果存在负面关联。 使用通用知识来执行特定领域(如意识形态量表)的任务被证明是不可靠的。其他关键发现包括政党内部差异较大、微调从更大规模的数据中受益更多以及零样本模型对提示内容敏感性高。评估涵盖了每种模型的优势和局限,并得出了自动化分析政治内容的最佳实践。
https://arxiv.org/abs/2501.09719
Hallucination remains a major challenge for Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) has gained increasing attention as a simple solution to hallucination issues. It directly learns from constructed preference pairs that reflect the severity of hallucinations in responses to the same prompt and image. Nonetheless, different data construction methods in existing works bring notable performance variations. We identify a crucial factor here: outcomes are largely contingent on whether the constructed data aligns on-policy w.r.t the initial (reference) policy of DPO. Theoretical analysis suggests that learning from off-policy data is impeded by the presence of KL-divergence between the updated policy and the reference policy. From the perspective of dataset distribution, we systematically summarize the inherent flaws in existing algorithms that employ DPO to address hallucination issues. To alleviate the problems, we propose On-Policy Alignment (OPA)-DPO framework, which uniquely leverages expert feedback to correct hallucinated responses and aligns both the original and expert-revised responses in an on-policy manner. Notably, with only 4.8k data, OPA-DPO achieves an additional reduction in the hallucination rate of LLaVA-1.5-7B: 13.26% on the AMBER benchmark and 5.39% on the Object-Hal benchmark, compared to the previous SOTA algorithm trained with 16k samples.
幻觉仍然是大型视觉语言模型(LVLM)面临的主要挑战之一。直接偏好优化(DPO)作为一种简单的解决方案,近年来受到了越来越多的关注,它通过从反映同一提示和图像的响应中幻觉严重程度所构建的偏好对进行直接学习。然而,现有的工作中的不同数据构建方法带来了显著的性能差异。我们在这里识别了一个关键因素:结果在很大程度上取决于所构建的数据是否与DPO最初的(参考)策略一致。理论上分析表明,从离策略数据学习会受到更新后的策略和参考策略之间存在的KL散度的影响。 从数据集分布的角度来看,我们系统地总结了现有算法使用DPO解决幻觉问题时固有的缺陷。为了解决这些问题,我们提出了在政策对齐(OPA)-DPO框架,它利用专家反馈来纠正幻觉响应,并以在策略的方式对准原始和经过专家修订的响应。值得注意的是,在仅使用4.8k数据的情况下,与先前使用的16k样本训练的最佳现有算法相比,OPA-DPO使LLaVA-1.5-7B模型在AMBER基准测试中实现了幻觉率额外降低13.26%,在Object-Hal基准测试中降低了5.39%。
https://arxiv.org/abs/2501.09695
Autonomous docking remains one of the most challenging maneuvers in marine robotics, requiring precise control and robust perception in confined spaces. This paper presents a novel approach integrating Model Predictive Path Integral(MPPI) control with real-time LiDAR-based dock detection for autonomous surface vessel docking. Our framework uniquely combines probabilistic trajectory optimization with a multiobjective cost function that simultaneously considers docking precision, safety constraints, and motion efficiency. The MPPI controller generates optimal trajectories by intelligently sampling control sequences and evaluating their costs based on dynamic clearance requirements, orientation alignment, and target position objectives. We introduce an adaptive dock detection pipeline that processes LiDAR point clouds to extract critical geometric features, enabling real-time updates of docking parameters. The proposed method is extensively validated in a physics-based simulation environment that incorporates realistic sensor noise, vessel dynamics, and environmental constraints. Results demonstrate successful docking from various initial positions while maintaining safe clearances and smooth motion characteristics.
自主对接仍然是海洋机器人技术中最具挑战性的操作之一,要求在狭小空间内进行精确控制和稳健感知。本文提出了一种新颖的方法,将模型预测路径积分(MPPI)控制与实时LiDAR-based船坞检测相结合,用于自主水面船舶的靠泊。我们的框架独特地结合了概率轨迹优化和一个多目标成本函数,同时考虑了对接精度、安全约束以及运动效率。 MPPI控制器通过智能抽样控制序列并根据动态避碰要求、方向对齐及目标位置目标来评估其成本,从而生成最优轨迹。我们引入了一种自适应船坞检测流水线,该流程处理LiDAR点云以提取关键几何特征,使对接参数能够在实时中更新。 所提出的方法在物理基础的仿真环境中进行了广泛的验证,该环境包括了现实传感器噪声、船舶动力学以及环境约束等要素。结果表明,从各种初始位置成功实现靠泊,并且保持安全距离和流畅运动特性。
https://arxiv.org/abs/2501.09668
Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and agent technology enables automatic CBT diagnosis and treatment. However, current LLM-based CBT systems use agents with a fixed structure, limiting their self-optimization capabilities, or providing hollow, unhelpful suggestions due to redundant response patterns. In this work, we utilize Quora-like and YiXinLi single-round consultation models to build a general agent framework that generates high-quality responses for single-turn psychological consultation scenarios. We use a bilingual dataset to evaluate the quality of single-response consultations generated by each framework. Then, we incorporate dynamic routing and supervisory mechanisms inspired by real psychological counseling to construct a CBT-oriented autonomous multi-agent framework, demonstrating its general applicability. Experimental results indicate that AutoCBT can provide higher-quality automated psychological counseling services.
传统面对面的心理咨询仍然主要局限于特定群体,通常是那些有心理问题的人的选择。而在线自动化心理咨询为那些因羞耻感而不愿意寻求帮助的人提供了一个潜在的解决方案。认知行为疗法(Cognitive Behavioral Therapy, CBT)是心理治疗中一种重要且广泛使用的方法。大型语言模型(Large Language Models, LLMs)和代理技术的发展使得自动化的CBT诊断与治疗成为可能。然而,目前基于LLM的CBT系统使用的通常是结构固定的代理,这限制了它们自我优化的能力;或者由于冗余的回答模式提供空洞且无益的建议。 在此项工作中,我们利用类似于Quora和“一颗心”单轮咨询模型构建了一个通用代理框架,该框架能够生成高质量的回答以应对单回合的心理咨询服务场景。通过使用双语数据集来评估每个框架所产生的单一回应心理咨询的质量。然后,我们将受实际心理治疗启发的动态路由和监管机制融入其中,构建一个面向CBT的自主多智能体框架,并展示了其广泛的适用性。 实验结果显示,AutoCBT能够提供更高质量的自动化心理健康咨询服务。
https://arxiv.org/abs/2501.09426
Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. Exploiting the heterogeneous capabilities of edge LLMs is crucial for diverse emerging applications, as it enables greater cost-effectiveness and reduced latency. In this work, we introduce \textit{Mixture-of-Edge-Experts (MoE$^2$)}, a novel collaborative inference framework for edge LLMs. We formulate the joint gating and expert selection problem to optimize inference performance under energy and latency constraints. Unlike conventional MoE problems, LLM expert selection is significantly more challenging due to the combinatorial nature and the heterogeneity of edge LLMs across various attributes. To this end, we propose a two-level expert selection mechanism through which we uncover an optimality-preserving property of gating parameters across expert selections. This property enables the decomposition of the training and selection processes, significantly reducing complexity. Furthermore, we leverage the objective's monotonicity and design a discrete monotonic optimization algorithm for optimal expert selection. We implement edge servers with NVIDIA Jetson AGX Orins and NVIDIA RTX 4090 GPUs, and perform extensive experiments. Our results validate that performance improvements of various LLM models and show that our MoE$^2$ method can achieve optimal trade-offs among different delay and energy budgets, and outperforms baselines under various system resource constraints.
大型语言模型(LLMs)在广泛的自然语言处理任务中展示了显著的能力。利用边缘设备上大型语言模型的异构能力对于各种新兴应用至关重要,因为它能够提高成本效益并减少延迟。在这项工作中,我们引入了“混合边缘专家(MoE²)”,这是一种针对边缘LLM的新颖协作推理框架。我们将联合门控和专家选择问题进行公式化,以在能量和延迟约束下优化推理性能。与传统的MoE问题不同,由于组合性质以及各种属性上的异质性,LLM的专家选择显著更具挑战性。 为此,我们提出了一种两级专家选择机制,通过这种方法发现了在各种专家选择中门控参数保持最优性的特性。这种特性使得训练和选择过程可以分解,从而大大降低了复杂度。此外,我们利用目标函数的单调性质,并设计了一个离散单调优化算法来进行最佳专家选择。 我们在NVIDIA Jetson AGX Orin和NVIDIA RTX 4090 GPU上实现了边缘服务器,并进行了广泛的实验。我们的结果验证了各种LLM模型的表现改进,并表明我们的MoE²方法能够在不同的延迟和能量预算之间实现最优权衡,且在各种系统资源约束下优于基线方法。 这一研究为如何更有效地利用边缘设备上的大型语言模型提供了新的思路和技术手段,有助于推动相关技术的发展与应用。
https://arxiv.org/abs/2501.09410
This work addresses the path planning problem for a group of unmanned aerial vehicles (UAVs) to maintain a desired formation during operation. Our approach formulates the problem as an optimization task by defining a set of fitness functions that not only ensure the formation but also include constraints for optimal and safe UAV operation. To optimize the fitness function and obtain a suboptimal path, we employ the teaching-learning-based optimization algorithm and then further enhance it with mechanisms such as mutation, elite strategy, and multi-subject combination. A number of simulations and experiments have been conducted to evaluate the proposed method. The results demonstrate that the algorithm successfully generates valid paths for the UAVs to fly in a triangular formation for an inspection task.
这项工作解决了多架无人飞行器(UAV)在操作过程中保持预定编队的路径规划问题。我们的方法通过定义一组适应度函数,将该问题形式化为一个优化任务,这些适应度函数不仅确保了编队结构,还包含了实现最优和安全UAV运行所需的约束条件。为了优化适应度函数并获得次优路径,我们采用基于教学-学习的优化算法,并进一步利用变异、精英策略及多主体组合等机制对该算法进行改进。为评估所提出的方法,已经进行了多项仿真与实验。结果显示,该算法能够成功生成有效路径,使UAV们以三角编队形式飞行执行检查任务。
https://arxiv.org/abs/2501.09357
Understanding the reliability of large language models (LLMs) has recently garnered significant attention. Given LLMs' propensity to hallucinate, as well as their high sensitivity to prompt design, it is already challenging to predict the performance of an individual LLM. However, the problem becomes more complex for compound LLM systems such as cascades, where in addition to each model's standalone performance, we must understand how the error rates of different models interact. In this paper, we present a probabilistic model for the joint performance distribution of a sequence of LLMs, which enables a framework for rationally tuning the confidence thresholds of a LLM cascade using continuous optimization. Compared to selecting confidence thresholds using grid search, our parametric Markov-copula model significantly improves runtime scaling with respect to the length of the cascade and the desired resolution of the cost-error curve, turning them from intractable into low-order polynomial. In addition, the optimal thresholds computed using our continuous optimization-based algorithm increasingly outperform those found via grid search as cascade length grows, improving the area under the cost-error curve by 1.9% on average for cascades consisting of at least three models. Overall, our Markov-copula model provides a rational basis for tuning LLM cascade performance and points to the potential of probabilistic methods in analyzing LLM systems.
理解大规模语言模型(LLM)的可靠性最近引起了广泛关注。由于LLM容易产生幻觉,以及对提示设计的高度敏感性,预测单个LLM的表现已经颇具挑战性。然而,在复合LLM系统(如级联结构)中,除了每个模型自身的性能之外,我们还必须理解不同模型之间错误率如何相互影响,这使得问题变得更加复杂。在本文中,我们提出了一种概率模型来描述一系列LLM的联合表现分布,从而为合理调整LLM级联系统的置信度阈值提供了一个基于连续优化的框架。 与使用网格搜索选择置信度阈值相比,我们的参数化马尔可夫-科皮拉(Markov-copula)模型在处理级联长度和所需误差成本曲线分辨率方面显著提高了运行时间的缩放效率,将原本难以解决的问题转化为低阶多项式问题。此外,随着级联结构中模型数量的增长,使用我们连续优化算法计算出的最佳阈值越来越优于网格搜索的结果,在至少包含三个模型的级联系统中,平均使误差成本曲线下的面积提升了1.9%。 总体而言,我们的马尔可夫-科皮拉模型为调优LLM级联性能提供了一个合理的依据,并且展示了概率方法在分析LLM系统的潜力。
https://arxiv.org/abs/2501.09345
We consider the protein sequence engineering problem, which aims to find protein sequences with high fitness levels, starting from a given wild-type sequence. Directed evolution has been a dominating paradigm in this field which has an iterative process to generate variants and select via experimental feedback. We demonstrate large language models (LLMs), despite being trained on massive texts, are secretly protein sequence optimizers. With a directed evolutionary method, LLM can perform protein engineering through Pareto and experiment-budget constrained optimization, demonstrating success on both synthetic and experimental fitness landscapes.
我们研究了蛋白质序列工程问题,该问题的目标是从给定的野生型序列出发,寻找具有高适应性水平的蛋白质序列。定向进化一直是这一领域的主导范式,它通过迭代生成变异体并通过实验反馈进行选择来实现目标。我们证明,大型语言模型(LLMs),尽管是基于海量文本训练出来的,实际上可以被用作蛋白质序列优化器。利用定向进化的方法,LLM可以通过帕累托最优和实验预算约束的优化来进行蛋白质工程,并且在合成和实验适应性景观上都取得了成功。
https://arxiv.org/abs/2501.09274
Vision-based tactile sensors have drawn increasing interest in the robotics community. However, traditional lens-based designs impose minimum thickness constraints on these sensors, limiting their applicability in space-restricted settings. In this paper, we propose ThinTact, a novel lensless vision-based tactile sensor with a sensing field of over 200 mm2 and a thickness of less than 10 this http URL utilizes the mask-based lensless imaging technique to map the contact information to CMOS signals. To ensure real-time tactile sensing, we propose a real-time lensless reconstruction algorithm that leverages a frequency-spatial-domain joint filter based on discrete cosine transform (DCT). This algorithm achieves computation significantly faster than existing optimization-based methods. Additionally, to improve the sensing quality, we develop a mask optimization method based on the generic algorithm and the corresponding system matrix calibration this http URL evaluate the performance of our proposed lensless reconstruction and tactile sensing through qualitative and quantitative experiments. Furthermore, we demonstrate ThinTact's practical applicability in diverse applications, including texture recognition and contact-rich object manipulation. The paper will appear in the IEEE Transactions on Robotics: this https URL. Video: this https URL
基于视觉的触觉传感器在机器人学领域引起了越来越多的关注。然而,传统的透镜设计对这些传感器施加了最小厚度限制,从而在空间受限的应用场景中应用受到限制。本文提出了ThinTact,这是一种新型无镜头的基于视觉的触觉传感器,其传感区域超过200平方毫米,厚度小于10毫米。它采用掩模无镜头成像技术将接触信息映射为CMOS信号。为了确保实时触觉感知,我们提出了一种实时无透镜重建算法,该算法利用基于离散余弦变换(DCT)的频域和空域联合滤波器。相比现有的优化方法,此算法实现了显著更快的计算速度。此外,为进一步提高传感质量,我们开发了一种基于通用算法及相应的系统矩阵校准的掩模优化方法。通过定性和定量实验来评估所提出的无透镜重建技术和触觉感知性能。另外,我们展示了ThinTact在包括纹理识别和接触密集型对象操作在内的多种应用中的实际适用性。 相关研究论文已发表于IEEE机器人技术汇刊(IEEE Transactions on Robotics)。视频演示链接见:[此处插入原文视频链接]。
https://arxiv.org/abs/2501.09273
Recent advancements in large language models (LLMs) have shown promise in medical applications such as disease diagnosis and treatment planning. However, most existing medical LLMs struggle with the advanced reasoning required for complex clinical scenarios, such as differential diagnosis or personalized treatment suggestions. We proposed FineMedLM-o1, which leverages high-quality synthetic medical data and long-form reasoning data for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), enabling advanced dialogue and deep reasoning capabilities. Additionally, we introduced Test-Time Training (TTT) in the medical domain for the first time, facilitating domain adaptation and ensuring reliable, accurate reasoning. Experimental results demonstrate that FineMedLM-o1 achieves a 23% average performance improvement over prior models on key medical benchmarks. Furthermore, the introduction of TTT provides an additional 14% performance boost, highlighting its effectiveness in enhancing medical reasoning capabilities. To support this process, we also proposed a novel method for synthesizing medical dialogue. Compared to other open-source datasets, our dataset stands out as superior in both quality and complexity. The project and data will be released on GitHub.
近期,在大型语言模型(LLM)在医学应用中的进展显示出其在疾病诊断和治疗规划方面的潜力。然而,大多数现有的医疗LLM在处理需要高级推理的复杂临床场景时(如鉴别诊断或个性化治疗建议),表现力不从心。为此,我们提出了FineMedLM-o1,该模型利用高质量的人工合成医疗数据和长文本推理数据进行监督微调(SFT)和直接偏好优化(DPO),从而提升了对话能力和深层次的推理能力。 此外,我们在医学领域首次引入了测试时训练(TTT),这有助于领域的适应性,并确保了可靠的、准确的推理。实验结果显示,FineMedLM-o1在关键医疗基准上比之前的模型平均性能提高了23%。特别是,TTT技术提供了额外的14%性能提升,突显了其在增强医学推理能力方面的有效性。 为了支持这一过程,我们还提出了一种新的合成医学对话的方法。与现有的开源数据集相比,我们的数据集在质量和复杂性方面都表现更优。该项目和相关数据将在GitHub上发布。
https://arxiv.org/abs/2501.09213
In this paper, we present an optimization-based framework for generating estimation-aware trajectories in scenarios where measurement (output) uncertainties are state-dependent and set-valued. The framework leverages the concept of regularity for set-valued output maps. Specifically, we demonstrate that, for output-regular maps, one can utilize a set-valued observability measure that is concave with respect to finite-horizon state trajectories. By maximizing this measure, optimized estimation-aware trajectories can be designed for a broad class of systems, including those with locally linearized dynamics. To illustrate the effectiveness of the proposed approach, we provide a representative example in the context of trajectory planning for vision-based estimation. We present an estimation-aware trajectory for an uncooperative target-tracking problem that uses a machine learning (ML)-based estimation module on an ego-satellite.
在这篇论文中,我们提出了一种基于优化的框架,用于在测量(输出)不确定性和状态相关的集合值情况下生成感知估计的轨迹。该框架利用了集合值输出映射的概念——正则性。具体而言,我们证明对于输出正则地图,可以使用一种相对于有限时间跨度的状态轨迹是凹形的集合值可观察度量。通过最大化这一度量,可以为包括具有局部线性化动力学的一类广泛系统设计优化后的估计感知轨迹。为了展示所提出方法的有效性,在基于视觉估计的轨迹规划背景下提供了一个代表性示例。我们展示了用于解决无合作目标跟踪问题的一个估计感知轨迹,该问题利用了一种搭载在自身卫星上的机器学习(ML)估计模块。
https://arxiv.org/abs/2501.09192
Medical image anonymization aims to protect patient privacy by removing identifying information, while preserving the data utility to solve downstream tasks. In this paper, we address the medical image anonymization problem with a two-stage solution: latent code projection and optimization. In the projection stage, we design a streamlined encoder to project input images into a latent space and propose a co-training scheme to enhance the projection process. In the optimization stage, we refine the latent code using two deep loss functions designed to address the trade-off between identity protection and data utility dedicated to medical images. Through a comprehensive set of qualitative and quantitative experiments, we showcase the effectiveness of our approach on the MIMIC-CXR chest X-ray dataset by generating anonymized synthetic images that can serve as training set for detecting lung pathologies. Source codes are available at this https URL.
医学图像匿名化旨在通过移除身份信息来保护患者隐私,同时保留数据效用以解决下游任务。在本文中,我们采用两阶段解决方案来解决医学图像的匿名化问题:潜代码投影和优化。在投影阶段,我们设计了一个精简编码器将输入图像映射到一个潜在空间,并提出了一种协同训练方案来增强投影过程。在优化阶段,我们使用两种深度损失函数对潜代码进行细化,这些损失函数旨在解决医学图像中身份保护与数据效用之间的权衡问题。通过一系列定性和定量实验,在MIMIC-CXR胸部X光数据集上展示了我们的方法的有效性,生成了可用于检测肺部病理的匿名合成训练集。源代码可在提供的链接获取。
https://arxiv.org/abs/2501.09114
Targeting the notorious cumulative drift errors in NeRF SLAM, we propose a Semantic-guided Loop Closure with Shared Latent Code, dubbed SLC$^2$-SLAM. Especially, we argue that latent codes stored in many NeRF SLAM systems are not fully exploited, as they are only used for better reconstruction. In this paper, we propose a simple yet effective way to detect potential loops using the same latent codes as local features. To further improve the loop detection performance, we use the semantic information, which are also decoded from the same latent codes to guide the aggregation of local features. Finally, with the potential loops detected, we close them with a graph optimization followed by bundle adjustment to refine both the estimated poses and the reconstructed scene. To evaluate the performance of our SLC$^2$-SLAM, we conduct extensive experiments on Replica and ScanNet datasets. Our proposed semantic-guided loop closure significantly outperforms the pre-trained NetVLAD and ORB combined with Bag-of-Words, which are used in all the other NeRF SLAM with loop closure. As a result, our SLC$^2$-SLAM also demonstrated better tracking and reconstruction performance, especially in larger scenes with more loops, like ScanNet.
针对NeRF SLAM中的累积漂移误差问题,我们提出了一种基于语义引导的循环闭合方法,并结合共享潜在代码(Semantic-guided Loop Closure with Shared Latent Code),简称SLC$^2$-SLAM。特别地,我们认为许多NeRF SLAM系统中存储的潜在代码没有得到充分利用,因为它们仅用于更好的重建。在本文中,我们提出了一种简单而有效的方法来利用这些相同的潜在代码作为局部特征来检测潜在循环。 为了进一步提高循环检测性能,我们使用从同一组潜在代码解码出的语义信息来指导局部特征的聚合过程。最后,在确定了潜在循环后,我们通过图优化和随后的束调整(bundle adjustment)来闭合这些循环,并以此细化估计的姿态和重建场景。 为评估我们的SLC$^2$-SLAM方法的效果,我们在Replica和ScanNet数据集上进行了广泛的实验。我们提出的基于语义引导的循环闭合法显著优于所有其他采用预训练NetVLAD与Bag-of-Words结合ORB的方法在NeRF SLAM中的表现。因此,在像ScanNet这样包含更多循环的大场景中,我们的SLC$^2$-SLAM方法展示了更佳的跟踪和重建性能。 通过这种方法,不仅解决了累积漂移误差问题,还显著提升了整体的定位与建图精度,特别是在复杂环境下的表现尤为突出。
https://arxiv.org/abs/2501.08880
The emerging field of vehicular ad hoc networks (VANETs) deals with a set of communicating vehicles which are able to spontaneously interconnect without any pre-existing infrastructure. In such kind of networks, it is crucial to make an optimal configuration of the communication protocols previously to the final network deployment. This way, a human designer can obtain an optimal QoS of the network beforehand. The problem we consider in this work lies in configuring the File Transfer protocol Configuration (FTC) with the aim of optimizing the transmission time, the number of lost packets, and the amount of data transferred in realistic VANET scenarios. We face the FTC with five representative state-of-the-art optimization techniques and compare their performance. These algorithms are: Particle Swarm Optimization (PSO), Differential Evolution (DE), Genetic Algorithm (GA), Evolutionary Strategy (ES), and Simulated Annealing (SA). For our tests, two typical environment instances of VANETs for Urban and Highway scenarios have been defined. The experiments using ns- 2 (a well-known realistic VANET simulator) reveal that PSO outperforms all the compared algorithms for both studied VANET instances.
新兴的车载自组织网络(VANET)领域处理的是能够在没有预先存在的基础设施的情况下自发互连的一组通信车辆。在这种类型的网络中,在最终部署前对通信协议进行最佳配置至关重要,这样设计人员就可以提前获得网络的最佳服务质量(QoS)。我们在此项工作中考虑的问题是通过配置文件传输协议配置(FTC),以优化传输时间、丢失的数据包数量以及在现实VANET场景中传输的数据量。我们使用五种最先进的优化技术来解决这一问题,并比较它们的性能。这些算法包括:粒子群优化(PSO)、差分进化(DE)、遗传算法(GA)、进化策略(ES)和模拟退火(SA)。为了进行测试,定义了两种典型的VANET环境实例——城市场景和高速公路场景。使用ns-2(一个广为人知的现实VANET仿真器)进行的实验表明,在所研究的所有VANET实例中,PSO算法的表现都优于所有比较的其他算法。
https://arxiv.org/abs/2501.08847
Sentiment analysis is one of the most crucial tasks in Natural Language Processing (NLP), involving the training of machine learning models to classify text based on the polarity of opinions. Pre-trained Language Models (PLMs) can be applied to downstream tasks through fine-tuning, eliminating the need to train the model from scratch. Specifically, PLMs have been employed for Sentiment Analysis, a process that involves detecting, analyzing, and extracting the polarity of text sentiments. Numerous models have been proposed to address this task, with pre-trained PhoBERT-V2 models standing out as the state-of-the-art language models for Vietnamese. The PhoBERT-V2 pre-training approach is based on RoBERTa, optimizing the BERT pre-training method for more robust performance. In this paper, we introduce a novel approach that combines PhoBERT-V2 and SentiWordnet for Sentiment Analysis of Vietnamese reviews. Our proposed model utilizes PhoBERT-V2 for Vietnamese, offering a robust optimization for the prominent BERT model in the context of Vietnamese language, and leverages SentiWordNet, a lexical resource explicitly designed to support sentiment classification applications. Experimental results on the VLSP 2016 and AIVIVN 2019 datasets demonstrate that our sentiment analysis system has achieved excellent performance in comparison to other models.
情感分析是自然语言处理(NLP)中最关键的任务之一,涉及通过训练机器学习模型来根据意见的极性对文本进行分类。预训练的语言模型(PLM)可以通过微调应用于下游任务,从而无需从头开始重新训练模型。具体来说,这些PLMs已被用于情感分析过程,该过程包括检测、分析和提取文本情绪的极性。已经提出了多种模型来解决这一任务,其中基于RoBERTa优化了BERT预训练方法的PhoBERT-V2预训练方法脱颖而出,成为越南语最先进的语言模型。在这篇论文中,我们介绍了一种结合使用PhoBERT-V2和SentiWordnet进行越南评论情感分析的新颖方法。我们的提议模型利用了针对越南语进行了强大优化的PhoBERT-V2,并借鉴了专门为支持情感分类应用而设计的词典资源SentiWordnet。在VLSP 2016和AIVIVN 2019数据集上的实验结果表明,我们的情感分析系统与其他模型相比取得了卓越的成绩。 这一段文本概述了一个基于PhoBERT-V2和SentiWordnet的新情感分析方法,并强调了该系统的性能优势。
https://arxiv.org/abs/2501.08758
Dynamic MRI reconstruction, one of inverse problems, has seen a surge by the use of deep learning techniques. Especially, the practical difficulty of obtaining ground truth data has led to the emergence of unsupervised learning approaches. A recent promising method among them is implicit neural representation (INR), which defines the data as a continuous function that maps coordinate values to the corresponding signal values. This allows for filling in missing information only with incomplete measurements and solving the inverse problem effectively. Nevertheless, previous works incorporating this method have faced drawbacks such as long optimization time and the need for extensive hyperparameter tuning. To address these issues, we propose Dynamic-Aware INR (DA-INR), an INR-based model for dynamic MRI reconstruction that captures the spatial and temporal continuity of dynamic MRI data in the image domain and explicitly incorporates the temporal redundancy of the data into the model structure. As a result, DA-INR outperforms other models in reconstruction quality even at extreme undersampling ratios while significantly reducing optimization time and requiring minimal hyperparameter tuning.
动态MRI重建作为逆问题的一种,通过使用深度学习技术得到了显著的发展。特别是,获取真实数据的实际困难导致了无监督学习方法的出现。其中最近一种有前景的方法是隐式神经表示(INR),它将数据定义为一个连续函数,该函数将坐标值映射到相应的信号值上。这种方法允许仅通过不完整的测量来填补缺失的信息,并有效地解决逆问题。然而,之前采用此方法的工作面临着诸如优化时间长和需要大量超参数调整等缺点。 为了克服这些问题,我们提出了动态感知的INR(DA-INR),这是一种基于INR的动态MRI重建模型,该模型在图像域中捕捉到了动态MRI数据的空间和时间连续性,并明确地将数据的时间冗余纳入了模型结构。因此,即使在极端欠采样比率下,DA-INR也能以显著减少优化时间和最小化超参数调整的前提下,优于其他模型的重建质量。
https://arxiv.org/abs/2501.09049
Mobile robot fleets are currently used in different scenarios such as medical environments or logistics. The management of these systems provides different challenges that vary from the control of the movement of each robot to the allocation of tasks to be performed. Task Allocation (TA) problem is a key topic for the proper management of mobile robot fleets to ensure the minimization of energy consumption and quantity of necessary robots. Solutions on this aspect are essential to reach economic and environmental sustainability of robot fleets, mainly in industry applications such as warehouse logistics. The minimization of energy consumption introduces TA problem as an optimization issue which has been treated in recent studies. This work focuses on the analysis of current trends in solving TA of mobile robot fleets. Main TA optimization algorithms are presented, including novel methods based on Artificial Intelligence (AI). Additionally, this work showcases most important results extracted from simulations, including frameworks utilized for the development of the simulations. Finally, some conclusions are obtained from the analysis to target on gaps that must be treated in the future.
当前,移动机器人舰队被应用于多种场景中,如医疗环境或物流。这些系统的管理提供了从控制每个机器人的运动到任务分配的各种挑战。任务分配(TA)问题是确保移动机器人舰队高效运行的关键问题之一,旨在最小化能耗和所需机器人数量。在诸如仓库物流等工业应用领域实现机器人车队的经济效益和环保可持续性方面,解决这一问题至关重要。 随着对能源消耗最小化的追求,任务分配问题逐渐成为一个优化难题,并受到了近期研究的关注。本文重点分析了目前解决移动机器人舰队任务分配(TA)的趋势。文中介绍了主要的任务分配优化算法,包括基于人工智能(AI)的新方法。此外,文章展示了从模拟实验中提取的最重要结果,其中包括用于开发这些模拟的框架。最后,通过对现有问题进行分析,指出了未来研究需要填补的一些空白。 总之,本文旨在概述当前解决移动机器人舰队任务分配优化问题的方法和技术进展,并为未来的研究方向提供了指导和建议。
https://arxiv.org/abs/2501.08726
Visual odometry (VO) plays a crucial role in autonomous driving, robotic navigation, and other related tasks by estimating the position and orientation of a camera based on visual input. Significant progress has been made in data-driven VO methods, particularly those leveraging deep learning techniques to extract image features and estimate camera poses. However, these methods often struggle in low-light conditions because of the reduced visibility of features and the increased difficulty of matching keypoints. To address this limitation, we introduce BrightVO, a novel VO model based on Transformer architecture, which not only performs front-end visual feature extraction, but also incorporates a multi-modality refinement module in the back-end that integrates Inertial Measurement Unit (IMU) data. Using pose graph optimization, this module iteratively refines pose estimates to reduce errors and improve both accuracy and robustness. Furthermore, we create a synthetic low-light dataset, KiC4R, which includes a variety of lighting conditions to facilitate the training and evaluation of VO frameworks in challenging environments. Experimental results demonstrate that BrightVO achieves state-of-the-art performance on both the KiC4R dataset and the KITTI benchmarks. Specifically, it provides an average improvement of 20% in pose estimation accuracy in normal outdoor environments and 259% in low-light conditions, outperforming existing methods. For widespread use and further development, the research work is fully open-source at this https URL.
视觉里程计(VO)在自动驾驶、机器人导航及其他相关任务中发挥着关键作用,它通过基于视觉输入来估计相机的位置和方向。近年来,在数据驱动的VO方法方面取得了显著进展,特别是那些利用深度学习技术提取图像特征并估算摄像机姿态的方法。然而,这些方法往往难以应对低光环境下的挑战,因为在这种条件下可见特征减少且匹配关键点变得更为困难。 为了解决这一限制,我们引入了BrightVO,这是一种基于Transformer架构的新型VO模型,它不仅执行前端视觉特性提取,还在后端整合了一个多模态精炼模块,该模块结合了惯性测量单元(IMU)的数据。利用姿态图优化方法,此模块可以迭代地细化姿态估计值以减少误差并提高精度和鲁棒性。 此外,我们创建了一个合成低光数据集KiC4R,该数据集中包含各种照明条件,这有助于训练和评估在挑战环境下工作的VO框架。实验结果表明,BrightVO在KiC4R数据集以及KITTI基准测试中均表现出领先水平的性能。具体而言,在常规室外环境中,它将姿态估计精度平均提高了20%,而在低光条件下则提高了259%(相对于现有方法)。为了促进广泛使用和进一步开发,我们的研究工作完全开源,并可在[此链接](https://github.com/your-research-group/brightvo)访问。 注意:文中提及的GitHub链接需要替换为实际公开的研究组BrightVO代码库链接。
https://arxiv.org/abs/2501.08659
This paper summarizes in depth the state of the art of aerial swarms, covering both classical and new reinforcement-learning-based approaches for their management. Then, it proposes a hybrid AI system, integrating deep reinforcement learning in a multi-agent centralized swarm architecture. The proposed system is tailored to perform surveillance of a specific area, searching and tracking ground targets, for security and law enforcement applications. The swarm is governed by a central swarm controller responsible for distributing different search and tracking tasks among the cooperating UAVs. Each UAV agent is then controlled by a collection of cooperative sub-agents, whose behaviors have been trained using different deep reinforcement learning models, tailored for the different task types proposed by the swarm controller. More specifically, proximal policy optimization (PPO) algorithms were used to train the agents' behavior. In addition, several metrics to assess the performance of the swarm in this application were defined. The results obtained through simulation show that our system searches the operation area effectively, acquires the targets in a reasonable time, and is capable of tracking them continuously and consistently.
这篇论文深入总结了空中群集技术的最新进展,涵盖了传统方法和基于强化学习的新方法。然后提出了一种混合人工智能系统,该系统在多智能体集中式群架构中结合了深度强化学习技术。所提出的系统专为特定区域的安全监控、搜索和追踪地面目标而设计,适用于安全和执法应用。 该空中群集由一个中央群控制器管理,负责将不同的搜索和跟踪任务分配给合作的无人机(UAV)。每架无人机则由一组协作子代理控制,这些行为模型通过不同类型的深度强化学习进行训练。具体而言,使用了近端策略优化(PPO)算法来训练各个智能体的行为。 此外,还定义了几种评估系统性能的指标。仿真结果表明,我们的系统能够有效搜索操作区域,在合理的时间内获取目标,并且具备持续、稳定地追踪它们的能力。
https://arxiv.org/abs/2501.08655