Cellular automata have become a cornerstone for investigating emergence and self-organization across diverse scientific disciplines, spanning neuroscience, artificial life, and theoretical physics. However, the absence of a hardware-accelerated cellular automata library limits the exploration of new research directions, hinders collaboration, and impedes reproducibility. In this work, we introduce CAX (Cellular Automata Accelerated in JAX), a high-performance and flexible open-source library designed to accelerate cellular automata research. CAX offers cutting-edge performance and a modular design through a user-friendly interface, and can support both discrete and continuous cellular automata with any number of dimensions. We demonstrate CAX's performance and flexibility through a wide range of benchmarks and applications. From classic models like elementary cellular automata and Conway's Game of Life to advanced applications such as growing neural cellular automata and self-classifying MNIST digits, CAX speeds up simulations up to 2,000 times faster. Furthermore, we demonstrate CAX's potential to accelerate research by presenting a collection of three novel cellular automata experiments, each implemented in just a few lines of code thanks to the library's modular architecture. Notably, we show that a simple one-dimensional cellular automaton can outperform GPT-4 on the 1D-ARC challenge.
细胞自动机已成为研究 emergence和自组织现象跨多个学科的基石,包括神经科学、人工生命和理论物理学。然而,缺乏硬件加速的细胞自动机库限制了探索新的研究方向,阻碍了合作,并阻碍了可重复性。在这项工作中,我们介绍了 CAX(细胞自动机加速器),一个高性能且灵活的开放源代码库,旨在加速细胞自动机研究。CAX 通过用户友好的界面提供了尖端的性能和模块化设计,并可以支持任何数量维度的离散和连续细胞自动机。我们通过广泛的基准测试和应用展示了 CAX 的性能和灵活性。从经典模型如基本细胞自动机和康威的生物游戏到高级应用如生长神经细胞自动机和自分类的MNIST数字,CAX 加快了模拟速度2000倍。此外,我们通过展示由库的模块化架构实现的一系列新颖细胞自动机实验,证明了 CAX 在加速研究方面的潜力。值得注意的是,我们展示了简单的单维度细胞自动机在1D-ARC挑战中可以击败GPT-4。
https://arxiv.org/abs/2410.02651
Predictive business process analytics has become important for organizations, offering real-time operational support for their processes. However, these algorithms often perform unfair predictions because they are based on biased variables (e.g., gender or nationality), namely variables embodying discrimination. This paper addresses the challenge of integrating a debiasing phase into predictive business process analytics to ensure that predictions are not influenced by biased variables. Our framework leverages on adversial debiasing is evaluated on four case studies, showing a significant reduction in the contribution of biased variables to the predicted value. The proposed technique is also compared with the state of the art in fairness in process mining, illustrating that our framework allows for a more enhanced level of fairness, while retaining a better prediction quality.
预测性业务过程分析已成为组织的重要工具,为企业的过程提供实时运营支持。然而,这些算法通常基于带有偏见的变量(例如性别或国籍)的偏见变量进行不公平预测。本文解决了一个将去偏阶段整合到预测业务过程分析中的挑战,以确保预测不受偏见变量的影响。我们的框架在四个案例研究中评估了基于对抗去偏的框架,表明去偏变量的贡献显著降低到了预测值的预测值。所提出的技术还与过程挖掘中公平性的最先进水平进行了比较,说明了我们的框架允许有更强的公平性,同时保持更好的预测质量。
https://arxiv.org/abs/2410.02618
Unsupervised constituency parsers organize phrases within a sentence into a tree-shaped syntactic constituent structure that reflects the organization of sentence semantics. However, the traditional objective of maximizing sentence log-likelihood (LL) does not explicitly account for the close relationship between the constituent structure and the semantics, resulting in a weak correlation between LL values and parsing accuracy. In this paper, we introduce a novel objective for training unsupervised parsers: maximizing the information between constituent structures and sentence semantics (SemInfo). We introduce a bag-of-substrings model to represent the semantics and apply the probability-weighted information metric to estimate the SemInfo. Additionally, we develop a Tree Conditional Random Field (TreeCRF)-based model to apply the SemInfo maximization objective to Probabilistic Context-Free Grammar (PCFG) induction, the state-of-the-art method for unsupervised constituency parsing. Experiments demonstrate that SemInfo correlates more strongly with parsing accuracy than LL. Our algorithm significantly enhances parsing accuracy by an average of 7.85 points across five PCFG variants and in four languages, achieving new state-of-the-art results in three of the four languages.
无监督的句法分析器将句子中的短语组织成树状的语义句法结构,反映了句子的语义结构。然而,传统目标最大化句子对数理逻辑(LL)并没有明确考虑到短语结构和语义之间的密切关系,导致LL值和解析准确性之间的相关性较弱。在本文中,我们引入了一个新的目标,用于训练无监督的解析器:最大化短语结构和句子语义之间的信息(SemInfo)。我们引入了一个字符串模型来表示语义,并应用概率加权信息度量来估计SemInfo。此外,我们开发了一个基于树条件随机场(TreeCRF)的模型,用于将SemInfo最大化目标应用于概率语义自由语法(PCFG)的归纳,这是目前最先进的无监督句法解析方法。实验证明,SemInfo与解析准确性之间的相关性比LL更强。我们的算法通过平均每个PCFG变体提高7.85个点,在五种语言中显著增强解析准确性,并在三种语言中实现新的最先进结果。
https://arxiv.org/abs/2410.02558
With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.
在具有广泛的国家行动空间的情况下,高效的 multi- 代理探索仍然是强化学习中的一个长期挑战。尽管追求新颖性、多样性和不确定性吸引了越来越多的关注,但盲目探索带来的冗余努力对社区来说是一个实际问题。本文介绍了一种系统方法,称为 LEMAE,选择从知识渊博的大型语言模型 (LLM) 中获取有关高效 multi- 代理探索的有用任务相关指导。具体来说,我们在低 LLM 推理成本的条件下,将语言知识从 LLM grounding into symbolic key states,这些状态对于任务完成至关重要。为了释放关键状态的力量,我们设计了基于子空间的 Hindsight Intrinsic Reward (SHIR),通过增加奖励密度来引导代理器到达关键状态。此外,我们还构建了用于特定任务的关键状态记忆树 (KSMT),以跟踪关键状态之间的转移。通过减少冗余探索,LEMAE 在具有挑战性的基准(如 SMAC 和 MPE)上超过了现有 SOTA 方法,实现了某些场景下的 10 倍加速。
https://arxiv.org/abs/2410.02511
The rapid advancements in autonomous vehicle software present both opportunities and challenges, especially in enhancing road safety. The primary objective of autonomous vehicles is to reduce accident rates through improved safety measures. However, the integration of new algorithms into the autonomous vehicle, such as Artificial Intelligence methods, raises concerns about the compliance with established safety regulations. This paper introduces a novel software architecture based on behavior trees, aligned with established standards and designed to supervise vehicle functional safety in real time. It specifically addresses the integration of algorithms into industrial road vehicles, adhering to the ISO 26262. The proposed supervision methodology involves the detection of hazards and compliance with functional and technical safety requirements when a hazard arises. This methodology, implemented in this study in a Renault Mégane (currently at SAE level 3 of automation), not only guarantees compliance with safety standards, but also paves the way for safer and more reliable autonomous driving technologies.
自动驾驶软件的快速发展既带来了机会,也带来了挑战,特别是在提高道路安全方面。自动驾驶汽车的主要目标是通过改进安全措施降低事故率。然而,将新的算法集成到自动驾驶汽车中,如人工智能方法,引起了关于是否符合既定安全法规的担忧。本文介绍了一种基于行为树的新软件架构,与既定标准保持一致,旨在实时监督车辆的功能安全。它特别关注将算法集成到工业道路上,遵循ISO 26262标准。所提出的监督方法包括在危险发生时检测危险并符合功能和技术安全要求。这项研究中的Renault Mégane(目前处于SAE level 3的自动化水平)不仅确保了符合安全标准,还为更安全、更可靠的自动驾驶技术铺平了道路。
https://arxiv.org/abs/2410.02469
Mamba, a special case of the State Space Model, is gaining popularity as an alternative to template-based deep learning approaches in medical image analysis. While transformers are powerful architectures, they have drawbacks, including quadratic computational complexity and an inability to address long-range dependencies efficiently. This limitation affects the analysis of large and complex datasets in medical imaging, where there are many spatial and temporal relationships. In contrast, Mamba offers benefits that make it well-suited for medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory. Mamba also demonstrates strong performance in merging multimodal data, improving diagnosis accuracy and patient outcomes. The organization of this paper allows readers to appreciate the capabilities of Mamba in medical imaging step by step. We begin by defining core concepts of SSMs and models, including S4, S5, and S6, followed by an exploration of Mamba architectures such as pure Mamba, U-Net variants, and hybrid models with convolutional neural networks, transformers, and Graph Neural Networks. We also cover Mamba optimizations, techniques and adaptations, scanning, datasets, applications, experimental results, and conclude with its challenges and future directions in medical imaging. This review aims to demonstrate the transformative potential of Mamba in overcoming existing barriers within medical imaging while paving the way for innovative advancements in the field. A comprehensive list of Mamba architectures applied in the medical field, reviewed in this work, is available at Github.
Mamba,一种 State Space Model 的特殊情况,正在成为医学图像分析中模板为基础的深度学习方法的替代品。尽管 Transformer 是一种强大的架构,但它们存在一些局限性,包括二次计算复杂性和无法有效地解决长距离依赖问题。这种局限性影响到医疗影像大数据的分析,其中存在许多空间和时间关系。相比之下,Mamba 提供了在医学图像分析中具有优势的益处。它具有线性时间复杂性,这是 Transformer 的重大改进。Mamba 在没有注意力机制的情况下处理较长的序列,实现更快的推理并需要更少的内存。Mamba 还展示了在合并多模态数据方面的强大性能,提高诊断准确性和患者 outcomes。本文的组织使读者能够逐步了解 Mamba 在医学影像分析中的能力。我们首先定义了 State Space Model 和模型的核心概念,包括 S4、S5 和 S6,接着探讨了 Mamba 的架构,如纯 Mamba、U-Net 变体和具有卷积神经网络、Transformer 和 Graph Neural Networks 的混合模型。我们还涵盖了 Mamba 的优化、技术和适应性,扫描、数据集、应用、实验结果,并最后结论与挑战及未来在医学影像领域的发展趋势。本综述旨在展示 Mamba 在克服现有医疗影像工作中的障碍的同时,为该领域推动创新进展奠定基础。本工作中回顾了在医学领域应用的 Mamba 架构的完整列表,可在 Github 上查看。
https://arxiv.org/abs/2410.02362
The Coastal underwater evidence search system with surface-underwater collaboration is designed to revolutionize the search for artificial objects in coastal underwater environments, overcoming limitations associated with traditional methods such as divers and tethered remotely operated vehicles. Our innovative multi-robot collaborative system consists of three parts, an autonomous surface vehicle as a mission control center, a towed underwater vehicle for wide-area search, and a biomimetic underwater robot inspired by marine organisms for detailed inspections of identified areas. We conduct extensive simulations and real-world experiments in pond environments and coastal fields to demonstrate the system potential to surpass the limitations of conventional underwater search methods, offering a robust and efficient solution for law enforcement and recovery operations in marine settings.
海洋水下证据搜索系统与水面下合作搜索是一个设计,旨在彻底颠覆沿海水下环境中寻找人造物体的传统方法,克服了与传统方法相关的限制,如潜水员和附着式远程操控车辆。我们创新的多机器人协同系统由三个部分组成,分别是自主水面车辆作为任务控制中心、拖行的水下车辆进行区域搜索和以海洋生物为灵感的水下机器人,用于对确定的区域进行详细检查。我们在池塘环境和沿海水域进行广泛的仿真和实地试验,以展示该系统在超越传统水下搜索方法的局限性方面具有潜力,为警察和救援人员在海洋环境中的执法和恢复操作提供了一个健壮和高效解决方案。
https://arxiv.org/abs/2410.02345
This paper reviews published research in the field of computer-aided colorization technology. We argue that the colorization task originates from computer graphics, prospers by introducing computer vision, and tends to the fusion of vision and graphics, so we put forward our taxonomy and organize the whole paper chronologically. We extend the existing reconstruction-based colorization evaluation techniques, considering that aesthetic assessment of colored images should be introduced to ensure that colorization satisfies human visual-related requirements and emotions more closely. We perform the colorization aesthetic assessment on seven representative unconditional colorization models and discuss the difference between our assessment and the existing reconstruction-based metrics. Finally, this paper identifies unresolved issues and proposes fruitful areas for future research and development. Access to the project associated with this survey can be obtained at this https URL.
本文回顾了计算机辅助色彩技术领域的已发表研究。我们认为,色彩化任务起源于计算机图形学,通过引入计算机视觉而得到发展,并且倾向于将视觉和图形融合。因此,我们提出了我们的分类体系并按时间顺序组织整篇文章。我们扩展了现有的基于重构的颜色化评估技术,考虑到色彩化应满足人类视觉相关需求和情感,从而使色彩化更加接近人类视觉体验。我们对七个具有代表性的无条件色彩化模型进行了色彩化美学评估,并讨论了我们的评估与现有基于重构的指标之间的差异。最后,本文指出了未解决的问题,并为未来的研究和开发提出了有前景的领域。与本调查相关的项目可以通过此链接获取:https://www.academia.edu/39511842/Unresolved_Issues_and_Future_Research_Development_in_Computer_Aided_Colorization_ Technology.
https://arxiv.org/abs/2410.02288
Convolutional neural networks (CNNs) have shown great effectiveness in medical image segmentation. However, they may be limited in modeling large inter-subject variations in organ shapes and sizes and exploiting global long-range contextual information. This is because CNNs typically employ convolutions with fixed-sized local receptive fields and lack the mechanisms to utilize global information. To address these limitations, we developed Dynamic Multi-Resolution Convolution (DMRC) and Dynamic Multi-Scale Convolution (DMSC) modules. Both modules enhance the representation capabilities of single convolutions to capture varying scaled features and global contextual information. This is achieved in the DMRC module by employing a convolutional filter on images with different resolutions and subsequently utilizing dynamic mechanisms to model global inter-dependencies between features. In contrast, the DMSC module extracts features at different scales by employing convolutions with different kernel sizes and utilizing dynamic mechanisms to extract global contextual information. The utilization of convolutions with different kernel sizes in the DMSC module may increase computational complexity. To lessen this burden, we propose to use a lightweight design for convolution layers with a large kernel size. Thus, DMSC and DMRC modules are designed as lightweight drop-in replacements for single convolutions, and they can be easily integrated into general CNN architectures for end-to-end training. The segmentation network was proposed by incorporating our DMSC and DMRC modules into a standard U-Net architecture, termed Dynamic Multi-scale and Multi-resolution Convolution network (DMC-Net). The results demonstrate that our proposed DMSC and DMRC can enhance the representation capabilities of single convolutions and improve segmentation accuracy.
卷积神经网络(CNNs)在医学图像分割方面表现出了巨大的效果。然而,它们可能在大器官形状和大小的建模和利用全局长距离上下文信息方面受到限制。这是因为CNNs通常采用具有固定大小局部感受野的卷积操作,并缺乏利用全局信息的机制。为了应对这些限制,我们开发了动态多分辨率卷积(DMRC)和动态多尺度卷积(DMSC)模块。这两个模块通过在具有不同分辨率的图像上采用卷积操作来增强单卷积的表示能力,并利用动态机制建模特征之间的全局依赖关系。相比之下,DMSC模块通过采用不同尺寸的卷积操作来提取不同尺度的特征,并利用动态机制提取全局上下文信息。在DMSC模块中使用不同尺寸的卷积操作可能会增加计算复杂度。为了减轻这种负担,我们提出了一个适用于大卷积层的重量轻设计。因此,DMRC和DMSC模块被设计为轻量级的可插拔替代方案,可以轻松地集成到一般CNN架构中进行端到端的训练。我们提出的DMSC和DMC-Net架构将DMSC和DMSC模块集成到一个标准的U-Net架构中,称为动态多尺度多分辨率卷积网络(DMC-Net)。结果表明,我们的DMSC和DMC可以增强单卷积的表示能力,并提高分割精度。
https://arxiv.org/abs/2410.02129
Teaching text-to-image models to be creative involves using style ambiguity loss. In this work, we explore using the style ambiguity training objective, used to approximate creativity, on a diffusion model. We then experiment with forms of style ambiguity loss that do not require training a classifier or a labeled dataset, and find that the models trained with style ambiguity loss can generate better images than the baseline diffusion models and GANs. Code is available at this https URL.
使文本到图像模型具有创造力涉及使用风格不确定性损失。在这项工作中,我们探讨了在扩散模型上使用风格不确定性训练目标来近似创造性的方法。然后,我们实验了不需要训练分类器或标注数据集的各种形式风格的模糊性损失,并发现使用风格不确定性损失训练的模型可以生成比基线扩散模型和GAN更好的图像。代码可在此处访问:https://url.org/
https://arxiv.org/abs/2410.02055
De-identification is important in protecting patients' privacy for healthcare text analytics. The MASK framework is one of the best on the de-identification shared task organised by n2c2/i2b2 challenges. This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts, alongside traditional de-identification methods like dictionary lookup and rule-based approaches. The system effectively identifies and either redacts or replaces sensitive identifiable entities within clinical documents, while also allowing users to customise the masked documents according to their specific needs. The integration of ClinicalBERT significantly improves the performance of entity recognition, achieving 0.9732 F1-score, especially for common entities such as names, dates, and locations. A risk assessment feature has also been developed, which analyses the uniqueness of context within documents to classify them into risk levels, guiding further de-identification efforts. While the system demonstrates strong overall performance, this work highlights areas for future improvement, including handling more complex entity occurrences and enhancing the system's adaptability to different clinical settings.
匿名化在保护患者隐私的医疗文本分析中非常重要。MASK框架是n2c2/i2b2挑战组织的一个匿名化共享任务的佼佼者。这项工作通过将ClinicalBERT集成到MASK框架中,从而使其具有专门针对临床文本的深度学习模型的优势,并添加了传统匿名化方法,如词典查找和基于规则的方法。系统能够有效地识别并遮盖临床文档中敏感的可识别实体,同时允许用户根据其特定需求自定义遮盖文档。ClinicalBERT的集成显著提高了实体识别的性能,实现了0.9732的F1得分,尤其是对于常见实体如名称、日期和地点。还开发了一个风险评估功能,分析文档中上下文的独特性,将它们分类为风险水平,指导进一步的匿名化努力。虽然该系统在整体表现上表现出色,但这项工作揭示了未来需要改进的领域,包括处理更复杂的实体情况和提高系统的适应性以适应不同的临床环境。
https://arxiv.org/abs/2410.01648
Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.
神经辐射场(NeRF)广泛用于新颖视角合成,并用于3D物体检测(3DOD),通过视图合成表示为3DOD提供了有前途的方法。然而,NeRF面临固有局限性:(i)由于其隐含性,对3D物体检测的表示能力有限;(ii)渲染速度较慢。最近,3D高斯平铺(3DGS)作为一种明确的3D表示方法,应对这些局限性。受到这些优势的启发,本文将3DGS引入3D物体检测中,并提出了两个主要挑战: (i)平滑高斯斑块的模糊空间分布:3DGS主要依赖于2D像素级别监督,导致不清楚的高斯斑块的3D空间分布和物体与背景之间的区分度较低,阻碍3D物体检测;(ii)过多的背景斑块:2D图像通常包括大量背景像素,导致带有大量噪声的高斯斑块重建,降低了检测效果。为了应对挑战(i),我们利用3DGS从2D图像中提取的事实,提出了一种优雅而有效的解决方案,通过将2D边界指导相结合,显著增强了高斯斑块的空间分布,使得物体与背景之间的区分度更加清晰。为了应对挑战(ii),我们提出了一种使用2D盒子进行目标概率分布的策略,生成物体概率分布在3D空间中,允许在3D中进行有效的概率采样,以保留更多的物体斑块并减少噪声背景斑块。得益于我们的设计,我们的3DGS-DET在ScanNet数据集上显著优于当前最先进的NeRF-based方法,在ArkitScenes数据集上实现了+6.6的mAP@0.25和+8.1的mAP@0.5的改善,并且令人印象深刻的+31.5mAP@0.25在ArkitScenes数据集上。
https://arxiv.org/abs/2410.01647
Robot-assisted surgery has profoundly influenced current forms of minimally invasive surgery. However, in transurethral suburethral urological surgical robots, they need to work in a liquid environment. This causes vaporization of the liquid when shearing and heating is performed, resulting in bubble atomization that affects the visual perception of the robot. This can lead to the need for uninterrupted pauses in the surgical procedure, which makes the surgery take longer. To address the atomization characteristics of liquids under urological surgical robotic vision, we propose an unsupervised zero-shot dehaze method (RSF-Dehaze) for urological surgical robotic vision. Specifically, the proposed Region Similarity Filling Module (RSFM) of RSF-Dehaze significantly improves the recovery of blurred region tissues. In addition, we organize and propose a dehaze dataset for robotic vision in urological surgery (USRobot-Dehaze dataset). In particular, this dataset contains the three most common urological surgical robot operation scenarios. To the best of our knowledge, we are the first to organize and propose a publicly available dehaze dataset for urological surgical robot vision. The proposed RSF-Dehaze proves the effectiveness of our method in three urological surgical robot operation scenarios with extensive comparative experiments with 20 most classical and advanced dehazing and image recovery algorithms. The proposed source code and dataset are available at this https URL .
机器人辅助手术(Robot-assisted surgery)深刻地影响了微创手术的现有形式。然而,在经尿道 suburethral 外周泌尿外科手术机器人中,它们需要在一个液体环境中工作。在进行剪切和加热时,会导致液体的汽化,从而产生气泡润湿,影响机器人的视觉感知。这可能导致手术过程中需要暂停操作,使手术时间延长。为了解决泌尿外科手术机器人视觉中液体的雾化特性,我们提出了一个无需指导的零帧消雾方法(RSF-Dehaze)用于泌尿外科手术机器人视觉。 具体来说,RSFM(区域相似性填充模块)是RSF-Dehaze的提议消雾区域填充模块。该RSFM显著提高了模糊区域组织的恢复。此外,我们组织并提出了一个用于泌尿外科手术机器人视觉的消雾数据集(USRobot-Dehaze dataset)。特别是,这个数据集包含了三种最常用的泌尿外科手术机器人操作场景。据我们所知,我们 是第一个组织并公开发布泌尿外科手术机器人视觉消雾数据集的人。 提出的RSF-Dehaze证明了我们方法在三个泌尿外科手术机器人操作场景中的有效性,这些场景进行了与20个最经典和先进的消雾和图像恢复算法广泛的比较实验。提出的源代码和数据集可以在这个链接处获得:https://url。
https://arxiv.org/abs/2410.01395
Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before they can utilize business process management and its benefits. Yet, research towards this is severely restrained by an apparent lack of large and high-quality datasets. This lack of data can be attributed to, among other things, an absence of proper tool assistance for dataset creation, resulting in high workloads and inferior data quality. We explore two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model. A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to $-51.0\%$, and significantly improves annotation quality, up to $+38.9\%$. We make all data and code available to encourage further research on additional novel assistance strategies.
基于机器学习的自然语言文本过程描述生成过程模型的解决方案解决了耗时且昂贵的过程发现阶段。许多组织在利用业务流程管理和其好处之前,都必须进行这一阶段。然而,针对这一问题的研究受到了明显的缺乏大型和高质量数据集的限制。这种缺乏数据可以归因于,例如,缺乏创建数据集的适当工具辅助,导致工作负载高且数据质量差。我们探讨了两种辅助功能来支持数据集创建,一种是在文本中识别过程信息的推荐系统以及将已识别的过程信息作为图形化业务流程模型的可视化。 我们对31名受试者的控制用户研究显示,通过向数据集创建者提供建议,降低了所有工作负载,最高达-51.0%,并显著提高了注释质量,最高达+38.9%。我们还将所有数据和代码公开,鼓励进一步研究其他新颖的辅助策略。
https://arxiv.org/abs/2410.01356
Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC. Code implementation and dataset are available at \url{this https URL}.
分层文本分类(HTC)是将标签分配给在一个组织成层次结构的空间中的文本的任务。 最近的工作将HTC视为一个 conventional multilabel classification 问题,因此将其评估为如此。 相反,我们建议根据专门设计的层次度量评估模型,并展示度量选择和推理方法预测的复杂性。 我们引入了一个新的具有挑战性的数据集,并公平地评估了最近的精细模型,将其与一系列简单但强大的基线模型进行比较,包括一个新的理论驱动的损失。 最后,我们证明了这些基线模型通常与最新模型具有竞争力。 这突出了在提出新的HTC方法时仔细考虑评估方法的重要性。 代码实现和数据集可在此处获得:\url{这个链接}。
https://arxiv.org/abs/2410.01305
The task of action spotting consists in both identifying actions and precisely localizing them in time with a single timestamp in long, untrimmed video streams. Automatically extracting those actions is crucial for many sports applications, including sports analytics to produce extended statistics on game actions, coaching to provide support to video analysts, or fan engagement to automatically overlay content in the broadcast when specific actions occur. However, before 2018, no large-scale datasets for action spotting in sports were publicly available, which impeded benchmarking action spotting methods. In response, our team built the largest dataset and the most comprehensive benchmarks for sports video understanding, under the umbrella of SoccerNet. Particularly, our dataset contains a subset specifically dedicated to action spotting, called SoccerNet Action Spotting, containing more than 550 complete broadcast games annotated with almost all types of actions that can occur in a football game. This dataset is tailored to develop methods for automatic spotting of actions of interest, including deep learning approaches, by providing a large amount of manually annotated actions. To engage with the scientific community, the SoccerNet initiative organizes yearly challenges, during which participants from all around the world compete to achieve state-of-the-art performances. Thanks to our dataset and challenges, more than 60 methods were developed or published over the past five years, improving on the first baselines and making action spotting a viable option for the sports industry. This paper traces the history of action spotting in sports, from the creation of the task back in 2018, to the role it plays today in research and the sports industry.
动作识别的任务包括识别动作并准确地将它们在时间上与单个时刻戳配合。自动提取这些动作对许多体育应用程序至关重要,包括体育数据分析以产生关于比赛动作的扩展统计数据,教练提供支持给视频分析师,或者粉丝参与自动在直播中叠加特定动作。然而,在2018年之前,没有针对体育的大规模数据集可用于动作识别,这阻碍了基准动作识别方法的发展。为了应对这一挑战,我们的团队构建了体育视频理解的最大的数据集和最全面的基准,或在SoccerNet框架下。特别是,我们的数据集包含一个专门用于动作识别的部分,称为SoccerNet动作识别,包含几乎所有可以在足球比赛中发生的动作的完整直播比赛超过550个。这个数据集是为开发自动识别感兴趣动作的方法而设计的,包括深度学习方法,通过提供大量手动注释的动作。为了与科学界保持联系,SoccerNet项目组织年度挑战,期间来自世界各地的参与者竞争实现最先进的性能。感谢我们的数据集和挑战,在过去的五年里,有超过60种方法被开发或发布,提高了第一个基线,使动作识别成为体育行业的可行选项。本文回顾了体育动作识别的历史,从2018年该任务创建开始,到今天在研究和体育行业中所起的作用。
https://arxiv.org/abs/2410.01304
Immunohistochemical (IHC) stains play a vital role in a pathologist's analysis of medical images, providing crucial diagnostic information for various diseases. Virtual staining from hematoxylin and eosin (H&E)-stained whole slide images (WSIs) allows the automatic production of other useful IHC stains without the expensive physical staining process. However, current virtual WSI generation methods based on tile-wise processing often suffer from inconsistencies in content, texture, and color at tile boundaries. These inconsistencies lead to artifacts that compromise image quality and potentially hinder accurate clinical assessment and diagnoses. To address this limitation, we propose a novel consistent WSI synthesis network, CC-WSI-Net, that extends GAN models to produce seamless synthetic whole slide images. Our CC-WSI-Net integrates a content- and color-consistency supervisor, ensuring consistency across tiles and facilitating the generation of seamless synthetic WSIs while ensuring Sox10 immunohistochemistry accuracy in melanocyte detection. We validate our method through extensive image-quality analyses, objective detection assessments, and a subjective survey with pathologists. By generating high-quality synthetic WSIs, our method opens doors for advanced virtual staining techniques with broader applications in research and clinical care.
免疫组化(IHC)染色在病理学家分析医学影像中起着关键作用,为各种疾病提供重要诊断信息。从HE染色 whole slide images(WSIs)中进行虚拟染色允许在没有昂贵的物理染色过程的情况下自动生成其他有用的IHC染色。然而,基于块处理的方法生成的虚拟WSI通常在内容、纹理和颜色在块边界处存在不稳定性。这些不稳定性导致伪影,可能影响准确临床评估和诊断。为了克服这一局限,我们提出了一种新颖的CC-WSI合成网络,将GAN模型扩展以产生无缝的合成整张图片。我们的CC-WSI-Net集成了一个内容和服务器,确保跨块的一致性,并在保证Sox10免疫组化精度的 melanocyte检测的同时,促进无缝合成WSIs。我们对我们的方法通过大量的图像质量分析、客观检测评估和病理学家主观调查进行了验证。通过生成高质量的合成WSIs,我们的方法为在研究和临床护理中应用更广泛的虚拟染色技术打开了大门。
https://arxiv.org/abs/2410.01072
Recent attention-based volumetric segmentation (VS) methods have achieved remarkable performance in the medical domain which focuses on modeling long-range dependencies. However, for voxel-wise prediction tasks, discriminative local features are key components for the performance of the VS models which is missing in attention-based VS methods. Aiming at resolving this issue, we deliberately incorporate the convolutional encoder branch with transformer backbone to extract local and global features in a parallel manner and aggregate them in Cross Feature Mixer Module (CFMM) for better prediction of segmentation mask. Consequently, we observe that the derived model, Y-CT-Net, achieves competitive performance on multiple medical segmentation tasks. For example, on multi-organ segmentation, Y-CT-Net achieves an 82.4% dice score, surpassing well-tuned VS Transformer/CNN-like baselines UNETR/ResNet-3D by 2.9%/1.4%. With the success of Y-CT-Net, we extend this concept with hybrid attention models, that derived Y-CH-Net model, which brings a 3% improvement in terms of HD95 score for same segmentation task. The effectiveness of both models Y-CT-Net and Y-CH-Net verifies our hypothesis and motivates us to initiate the concept of Y-CA-Net, a versatile generic architecture based upon any two encoders and a decoder backbones, to fully exploit the complementary strengths of both convolution and attention mechanisms. Based on experimental results, we argue Y-CA-Net is a key player in achieving superior results for volumetric segmentation.
近年来,基于注意力的体积分割(VS)方法在医学领域取得了显著的性能,主要建模长距离依赖关系。然而,对于体积预测任务,基于注意力的VS模型中的关键组件是局部特征,而在注意力的VS方法中是缺失的。为了解决这个问题,我们故意将卷积编码器分支与Transformer骨架结合,以并行提取局部和全局特征,并在Cross Feature Mixer Module(CFMM)中聚合它们,以提高分割掩膜的预测性能。经过实验观察,我们发现Y-CT-Net模型在多个医学分割任务上具有竞争力的性能。例如,在多器官分割中,Y-CT-Net的得分提高到82.4%,超过了UNETR和ResNet-3D等经过良好调优的VS Transformer/CNN类似基线的2.9%/1.4%。由于Y-CT-Net的成功,我们通过混合注意力的模型,将这个概念扩展到Hybrid Attention模型,即Y-CH-Net模型,该模型在相同分割任务中的HD95得分提高了3%。两种模型的有效性证实了我们的假设,并激励我们启动基于任何两个编码器和一个解码器骨架的Y-CA-Net通用架构的概念,以充分利用卷积和注意机制的互补优势。根据实验结果,我们认为Y-CA-Net是在体积分割中获得更好成绩的关键角色。
https://arxiv.org/abs/2410.01003
Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-aware metrics to support software communities given software repository data. While this research zoomed in on software communities, we believe the framework's applicability extends across various fields, showcasing expert-theory-inspired metrics that aid in triaging complex challenges.
在当今信息时代,指标和测量是识别挑战、指导决策和解决冲突的基本手段。尽管在信息时代大量数据可用,但单个专家很难在多学科数据中工作,非专家也可能会发现创建有效的指标或将理论转化为特定场景的指标是不直观的。本技术报告通过研究大型软件公司内的软件社区,解决了这一挑战。我们提出了一个灵感来自神经活动的提示工程框架,表明生成模型可以提取并总结理论并进行基本推理,从而将概念转化为场景感知的指标,以支持软件社区。虽然我们的研究重点在于软件社区,但我们相信该框架的应用范围延伸到各个领域,展示了专家理论引导的指标在帮助分类复杂挑战方面的应用。
https://arxiv.org/abs/2410.00880
As Large Language Models become ubiquitous in many sectors and tasks, there is a need to reduce token usage, overcoming challenges such as short context windows, limited output sizes, and costs associated with token intake and generation, especially in API-served LLMs. This work brings the Design Structure Matrix from the engineering design discipline into LLM conversation optimization. Applied to a use case in which the LLM conversation is about the design of a spacecraft and its subsystems, the DSM, with its analysis tools such as clustering and sequencing, demonstrates being an effective tool to organize the conversation, minimizing the number of tokens sent to or retrieved from the LLM at once, as well as grouping chunks that can be allocated to different context windows. Hence, this work broadens the current set of methodologies for token usage optimization and opens new avenues for the integration of engineering design practices into LLMs.
随着大型语言模型在许多行业和任务中变得普遍,有必要减少标记的使用,克服诸如短上下文窗口、有限输出大小和标记获取和生成的相关成本等挑战,特别是在API服务的LLM中。本文将工程设计学科中的设计结构矩阵引入LLM对话优化。将其应用于一个使用案例,其中LLM对话是关于设计航天器和其子系统的对话,DSM,以及其分析工具(如聚类和排序),证明DSM是一个有效的对话组织工具,可以同时最小化向或从LLM发送的标记数量,并将可以分配给不同上下文窗口的片段分组。因此,本工作拓宽了当前标记使用优化方法的集合,并为将工程设计实践融入LLM提供了新的途径。
https://arxiv.org/abs/2410.00749