A positive margin may result in an increased risk of local recurrences after breast retention surgery for any malignant tumour. In order to reduce the number of positive margins would offer surgeon real-time intra-operative information on the presence of positive resection margins. This study aims to design an intra-operative tumour margin evaluation scheme by using specimen mammography in breast-conserving surgery. Total of 30 cases were evaluated and compared with the manually determined contours by experienced physicians and pathology report. The proposed method utilizes image thresholding to extract regions of interest and then performs a deep learning model, i.e. SegNet, to segment tumour tissue. The margin width of normal tissues surrounding it is evaluated as the result. The desired size of margin around the tumor was set for 10 mm. The smallest average difference to manual sketched margin (6.53 mm +- 5.84). In the all case, the SegNet architecture was utilized to obtain tissue specimen boundary and tumor contour, respectively. The simulation results indicated that this technology is helpful in discriminating positive from negative margins in the intra-operative setting. The aim of proposed scheme was a potential procedure in the intra-operative measurement system. The experimental results reveal that deep learning techniques can draw results that are consistent with pathology reports.
积极的 margin 可能导致任何恶性肿瘤手术后局部复发风险增加。为了减少阳性切缘的数量,从而让外科医生在手术过程中实时了解阳性切除切缘的存在,这项研究旨在通过乳腺保乳手术使用组织乳腺X光检查设计一种术中肿瘤边缘评估方案。共评价了30例病例,并将其与有经验的医生手动确定的轮廓进行比较。所提出的方法利用图像阈值提取感兴趣区域,然后进行深度学习模型,即SegNet,对肿瘤组织进行分割。评估正常组织周围的边缘宽度作为结果。肿瘤周围希望的切缘大小设置为10毫米。所有情况下,SegNet架构都被用于获取组织样本边界和肿瘤轮廓。仿真结果表明,这种技术在术中可以帮助鉴别阳性切缘和阴性切缘。所提出方案的潜在程序性在于术中测量系统中。实验结果表明,深度学习技术可以获得与病理报告结果一致的结果。
https://arxiv.org/abs/2404.10600
This study explores F0 entrainment in second language (L2) English speech imitation during an Alternating Reading Task (ART). Participants with Italian, French, and Slovak native languages imitated English utterances, and their F0 entrainment was quantified using the Dynamic Time Warping (DTW) distance between the parameterized F0 contours of the imitated utterances and those of the model utterances. Results indicate a nuanced relationship between L2 English proficiency and entrainment: speakers with higher proficiency generally exhibit less entrainment in pitch variation and declination. However, within dyads, the more proficient speakers demonstrate a greater ability to mimic pitch range, leading to increased entrainment. This suggests that proficiency influences entrainment differently at individual and dyadic levels, highlighting the complex interplay between language skill and prosodic adaptation.
本研究探讨了在交替阅读任务(ART)中,第二语言(L2)英语语音模仿中的F0同步现象。参与者用意大利语、法语和斯洛伐克语进行了英语句子的模仿,并通过参数化F0轮廓的动态时间膨胀(DTW)距离来量化他们的F0同步。结果表明,L2英语能力与同步存在微妙的关联:英语能力较高的参与者通常在语调变化和降调方面的同步表现较少。然而,在双人对话中,英语能力较高的参与者表现出更强的模仿语调范围的能力,导致同步增加。这表明,在个体和双人层面上,能力对同步产生了不同的影响,突显了语言技能和 prosodic adaptation之间的复杂相互作用。
https://arxiv.org/abs/2404.10440
Navigation for thoracoabdominal puncture surgery is used to locate the needle entry point on the patient's body surface. The traditional reflective ball navigation method is difficult to position the needle entry point on the soft, irregular, smooth chest and abdomen. Due to the lack of clear characteristic points on the body surface using structured light technology, it is difficult to identify and locate arbitrary needle insertion points. Based on the high stability and high accuracy requirements of surgical navigation, this paper proposed a novel method, a muti-modal 3D small object medical marker detection method, which identifies the center of a small single ring as the needle insertion point. Moreover, this novel method leverages Fourier transform enhancement technology to augment the dataset, enrich image details, and enhance the network's capability. The method extracts the Region of Interest (ROI) of the feature image from both enhanced and original images, followed by generating a mask map. Subsequently, the point cloud of the ROI from the depth map is obtained through the registration of ROI point cloud contour fitting. In addition, this method employs Tukey loss for optimal precision. The experimental results show this novel method proposed in this paper not only achieves high-precision and high-stability positioning, but also enables the positioning of any needle insertion point.
导航定位胸腹部穿刺手术中的针头入口点是在患者身体表面的定位。传统的反射球导航方法很难在柔软、不规则、平滑的胸腹部上定位针头入口点。由于使用结构光技术来观察人体表面的结构化光点缺乏明确特征点,因此很难识别和定位任意针头插入点。根据高精度和高准确度手术导航的要求,本文提出了一个新方法,一种多模态3D小物体医学标记检测方法,将小单环的圆心确定为针头插入点。此外,这种新方法利用傅里叶变换增强技术来丰富数据集,增加图像细节,并增强网络的功能。该方法从增强和原始图像中提取目标区域,然后生成掩膜图。接下来,通过目标点云轮廓拟合来获得ROI点云。此外,该方法采用Tukey损失来实现最佳精度。实验结果表明,本文提出的新方法不仅实现了高精度和高稳定性的定位,而且还能够定位任何针头插入点。
https://arxiv.org/abs/2404.08990
Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for localizing text in dark that circumvents the need for LLE. We introduce a constrained learning module as an auxiliary mechanism during the training stage of the text detector. This module is designed to guide the text detector in preserving textual spatial features amidst feature map resizing, thus minimizing the loss of spatial information in texts under low-light visual degradations. Specifically, we incorporate spatial reconstruction and spatial semantic constraints within this module to ensure the text detector acquires essential positional and contextual range knowledge. Our approach enhances the original text detector's ability to identify text's local topological features using a dynamic snake feature pyramid network and adopts a bottom-up contour shaping strategy with a novel rectangular accumulation technique for accurate delineation of streamlined text features. In addition, we present a comprehensive low-light dataset for arbitrary-shaped text, encompassing diverse scenes and languages. Notably, our method achieves state-of-the-art results on this low-light dataset and exhibits comparable performance on standard normal light datasets. The code and dataset will be released.
在低光环境中定位文本具有挑战性,因为会出现视觉退化。尽管简单的解决方案涉及两个步骤:首先进行低光图像增强(LLE),然后是检测器,但LLE主要针对人类视觉而不是机器,并可能累积错误。在这项工作中,我们提出了一个高效且有效的单阶段方法来在黑暗中定位文本,绕过了需要LLE的步骤。我们在文本检测器的训练阶段引入了一个约束学习模块作为附加机制。这个模块的设计旨在指导文本检测器在特征图缩放过程中保留文本空间特征,从而在低光视觉退化下最小化文本中的空间信息损失。具体来说,我们在这个模块中引入了空间重构和空间语义约束,以确保文本检测器获得了关键的位置和上下文范围知识。我们的方法通过动态蛇特征金字塔网络增强了原始文本检测器的能力,并采用了一种新颖的矩形累积技术,实现了对平滑文本特征的准确边界描绘。此外,我们还提出了一个涵盖任意形状文本的全面低光数据集,包括各种场景和语言。值得注意的是,我们的方法在低光数据集上取得了最先进的成果,同时在标准正常光线数据集上的表现与标准 normal 光线数据集相当。代码和数据集将公开发布。
https://arxiv.org/abs/2404.08965
Existing angle-based contour descriptors suffer from lossy representation for non-starconvex shapes. By and large, this is the result of the shape being registered with a single global inner center and a set of radii corresponding to a polar coordinate parameterization. In this paper, we propose AdaContour, an adaptive contour descriptor that uses multiple local representations to desirably characterize complex shapes. After hierarchically encoding object shapes in a training set and constructing a contour matrix of all subdivided regions, we compute a robust low-rank robust subspace and approximate each local contour by linearly combining the shared basis vectors to represent an object. Experiments show that AdaContour is able to represent shapes more accurately and robustly than other descriptors while retaining effectiveness. We validate AdaContour by integrating it into off-the-shelf detectors to enable instance segmentation which demonstrates faithful performance. The code is available at this https URL.
现有的基于角度的轮廓描述符在非星形凸形状上存在失真表示。总的来说,这是由于将形状与单个全局内切中心和一系列半径对应于极坐标参数化相注册的结果。在本文中,我们提出了AdaContour,一种自适应轮廓描述符,它使用多个局部表示来 desirable地描述复杂形状。在训练集中对对象形状进行层次编码,并构建了所有子分区的轮廓矩阵后,我们计算了一个鲁棒的低秩鲁棒子空间,并通过线性组合共享基础向量来近似每个局部轮廓,从而表示一个物体。实验表明,AdaContour能够比其他描述符更准确、更稳健地表示形状,同时保持有效性和精确性。通过将AdaContour集成到标准检测器中进行实例分割,我们证明了其忠实性能。代码可在此处访问:https://url.com/
https://arxiv.org/abs/2404.08292
Deep learning-based medical image processing algorithms require representative data during development. In particular, surgical data might be difficult to obtain, and high-quality public datasets are limited. To overcome this limitation and augment datasets, a widely adopted solution is the generation of synthetic images. In this work, we employ conditional diffusion models to generate knee radiographs from contour and bone segmentations. Remarkably, two distinct strategies are presented by incorporating the segmentation as a condition into the sampling and training process, namely, conditional sampling and conditional training. The results demonstrate that both methods can generate realistic images while adhering to the conditioning segmentation. The conditional training method outperforms the conditional sampling method and the conventional U-Net.
基于深度学习的医疗图像处理算法在开发过程中需要代表性数据。特别是,手术数据可能很难获得,高质量的公共数据集有限。要克服这一局限并扩展数据集,一种广泛采用的解决方案是生成合成图像。在这项工作中,我们使用条件扩散模型根据轮廓和骨分割生成膝关节X线。值得注意的是,将分割作为一个条件融入抽样和训练过程中提出了两种不同的策略,即条件抽样和条件训练。结果显示,两种方法都可以生成逼真的图像,同时满足条件分割。条件训练方法超越了条件抽样方法和传统的U-Net。
https://arxiv.org/abs/2404.03541
Microvascular networks are challenging to model because these structures are currently near the diffraction limit for most advanced three-dimensional imaging modalities, including confocal and light sheet microscopy. This makes semantic segmentation difficult, because individual components of these networks fluctuate within the confines of individual pixels. Level set methods are ideally suited to solve this problem by providing surface and topological constraints on the resulting model, however these active contour techniques are extremely time intensive and impractical for terabyte-scale images. We propose a reformulation and implementation of the region-scalable fitting (RSF) level set model that makes it amenable to three-dimensional evaluation using both single-instruction multiple data (SIMD) and single-program multiple-data (SPMD) parallel processing. This enables evaluation of the level set equation on independent regions of the data set using graphics processing units (GPUs), making large-scale segmentation of high-resolution networks practical and inexpensive. We tested this 3D parallel RSF approach on multiple data sets acquired using state-of-the-art imaging techniques to acquire microvascular data, including micro-CT, light sheet fluorescence microscopy (LSFM) and milling microscopy. To assess the performance and accuracy of the RSF model, we conducted a Monte-Carlo-based validation technique to compare results to other segmentation methods. We also provide a rigorous profiling to show the gains in processing speed leveraging parallel hardware. This study showcases the practical application of the RSF model, emphasizing its utility in the challenging domain of segmenting large-scale high-topology network structures with a particular focus on building microvascular models.
微血管网络难以建模,因为这些结构目前接近大多数先进的三维成像技术的衍射极限,包括共轭梯度照明显微镜和光片显微镜。这使得语义分割变得困难,因为这些网络中单个组件在个体像素的约束范围内波动。等价集合方法理想地解决了这个问题,通过提供模型生成后的表面和拓扑约束,然而这些活动轮廓技术对于处理大规模图像来说非常耗时且不实用。我们提出了一个区域可扩展拟合(RSF)等价模型,使其具有使用单指令多数据(SIMD)和单程序多数据(SPMD)并行处理的能力。这使得可以使用图形处理器(GPUs)对数据集中的独立区域进行RSF方程的评估,实现对高分辨率网络的大规模分割。我们在采用最先进的成像技术从多个数据集中获取的微血管数据上进行了3D并行RSF方法的实际测试,包括微CT、光片荧光显微镜(LSFM)和加工显微镜。为了评估RSF模型的性能和准确性,我们采用蒙特卡洛基于比较技术将结果与其他分割方法进行比较。我们还对处理速度的提高进行了严格的度量,揭示了并行硬件在处理大型高 topology 网络结构方面的实际应用。本研究展示了RSF模型的实际应用价值,尤其是在构建具有特定关注度的微血管模型方面。
https://arxiv.org/abs/2404.02813
Some early violins have been reduced during their history to fit imposed morphological standards, while more recent ones have been built directly to these standards. We can observe differences between reduced and unreduced instruments, particularly in their contour lines and channel of minima. In a recent preliminary work, we computed and highlighted those two features for two instruments using triangular 3D meshes acquired by photogrammetry, whose fidelity has been assessed and validated with sub-millimetre accuracy. We propose here an extension to a corpus of 38 violins, violas and cellos, and introduce improved procedures, leading to a stronger discussion of the geometric analysis. We first recall the material we are working with. We then discuss how to derive the best reference plane for the violin alignment, which is crucial for the computation of contour lines and channel of minima. Finally, we show how to compute efficiently both characteristics and we illustrate our results with a few examples.
在它们的历史中,一些早期的大提琴已经减半以适应强制的形态标准,而更现代的大提琴则是直接按照这些标准建造的。我们可以观察到减半的大提琴和大提琴之间的轮廓线和最小通道的差异。在最近的一个初步工作中,我们使用通过摄影测量获得的三维三角网格计算并突出了这两个特征,其准确度已通过亚毫米级精度得到了评估和验证。在这里,我们提出了一个用于38个大提琴、小提琴和的大提琴的团体的扩展,并引入了改进的程序,导致几何分析的讨论更加深入。我们首先回忆我们正在处理的材料。然后我们讨论了如何确定最佳参考平面来对大提琴对齐,这对于计算轮廓线和最小通道至关重要。最后,我们展示了如何计算这两个特征的高效性,并通过几个例子说明了我们的结果。
https://arxiv.org/abs/2404.01995
Due to the uncertainty of traffic participants' intentions, generating safe but not overly cautious behavior in interactive driving scenarios remains a formidable challenge for autonomous driving. In this paper, we address this issue by combining a deep learning-based trajectory prediction model with risk potential field-based motion planning. In order to comprehensively predict the possible future trajectories of other vehicles, we propose a target-region based trajectory prediction model(TRTP) which considers every region a vehicle may arrive in the future. After that, we construct a risk potential field at each future time step based on the prediction results of TRTP, and integrate risk value to the objective function of Model Predictive Contouring Control(MPCC). This enables the uncertainty of other vehicles to be taken into account during the planning process. Balancing between risk and progress along the reference path can achieve both driving safety and efficiency at the same time. We also demonstrate the security and effectiveness performance of our method in the CARLA simulator.
由于交通参与者的意图不确定性,在交互式驾驶场景中产生安全但不过于谨慎的行为仍然是对自动驾驶的一个巨大挑战。在本文中,我们通过将基于深度学习的轨迹预测模型与基于风险势场的运动规划相结合来解决这个问题。为了全面预测其他车辆可能未来的轨迹,我们提出了一个基于目标区域的轨迹预测模型(TRTP)。然后,我们根据TRTP的预测结果在每个未来时间步构建一个风险势场,并将风险价值整合到Model预测控制(MPC)的对象函数中。这使得在规划过程中可以考虑到其他车辆的不确定性。在参考路径上平衡风险和进步可以实现同时提高驾驶安全性和效率。我们还证明了我们的方法在CARLA仿真器中的安全性和有效性性能。
https://arxiv.org/abs/2404.00893
Lumbar disc degeneration, a progressive structural wear and tear of lumbar intervertebral disc, is regarded as an essential role on low back pain, a significant global health concern. Automated lumbar spine geometry reconstruction from MR images will enable fast measurement of medical parameters to evaluate the lumbar status, in order to determine a suitable treatment. Existing image segmentation-based techniques often generate erroneous segments or unstructured point clouds, unsuitable for medical parameter measurement. In this work, we present TransDeformer: a novel attention-based deep learning approach that reconstructs the contours of the lumbar spine with high spatial accuracy and mesh correspondence across patients, and we also present a variant of TransDeformer for error estimation. Specially, we devise new attention modules with a new attention formula, which integrates image features and tokenized contour features to predict the displacements of the points on a shape template without the need for image segmentation. The deformed template reveals the lumbar spine geometry in the input image. We develop a multi-stage training strategy to enhance model robustness with respect to template initialization. Experiment results show that our TransDeformer generates artifact-free geometry outputs, and its variant predicts the error of a reconstructed geometry. Our code is available at this https URL.
腰椎间盘退化,腰椎间歇性椎间盘的渐进性结构和功能退化,被认为是导致腰痛和全球健康问题的关键因素。通过从MRI图像中自动重建腰椎脊柱几何形状,将能够快速测量医疗参数以评估腰椎状况,从而确定合适的治疗方案。现有的图像分割为基础的方法通常产生错误的分割或无结构的点云,不适合医疗参数测量。在这项工作中,我们提出了TransDeformer:一种基于注意力的深度学习方法,具有高空间精度和高网格对应性,用于重建腰椎脊柱的轮廓。我们还介绍了一种TransDeformer用于错误估计的变体。特别,我们设计了一种新的注意力模块,采用新的注意力公式,将图像特征和分割轮廓特征集成在一起,预测形状模板上点的位移,无需进行图像分割。变形模板揭示了输入图像中的腰椎脊柱几何形状。我们开发了一种多阶段训练策略,以增强模型在模板初始化方面的鲁棒性。实验结果表明,我们的TransDeformer生成了无伪影的拓扑结构输出,而其变体预测了重构几何的误差。我们的代码可在此处访问:https://www.thisurl.com/
https://arxiv.org/abs/2404.00231
Time-optimal quadrotor flight is an extremely challenging problem due to the limited control authority encountered at the limit of handling. Model Predictive Contouring Control (MPCC) has emerged as a leading model-based approach for time optimization problems such as drone racing. However, the standard MPCC formulation used in quadrotor racing introduces the notion of the gates directly in the cost function, creating a multi-objective optimization that continuously trades off between maximizing progress and tracking the path accurately. This paper introduces three key components that enhance the MPCC approach for drone racing. First and foremost, we provide safety guarantees in the form of a constraint and terminal set. The safety set is designed as a spatial constraint which prevents gate collisions while allowing for time-optimization only in the cost function. Second, we augment the existing first principles dynamics with a residual term that captures complex aerodynamic effects and thrust forces learned directly from real world data. Third, we use Trust Region Bayesian Optimization (TuRBO), a state of the art global Bayesian Optimization algorithm, to tune the hyperparameters of the MPC controller given a sparse reward based on lap time minimization. The proposed approach achieves similar lap times to the best state-of-the-art RL and outperforms the best time-optimal controller while satisfying constraints. In both simulation and real-world, our approach consistently prevents gate crashes with 100\% success rate, while pushing the quadrotor to its physical limit reaching speeds of more than 80km/h.
时间最优的 quadrotor 飞行是一个极其具有挑战性的问题,因为在手柄的极限处遇到的控制权限非常有限。为了实现时间优化的无人机竞赛,模型预测控制(MPC)作为一种基于模型的方法已经成为了领先的模式。然而,在无人机竞赛中使用的标准 MPCC 形式在成本函数中引入了门的概念,导致目标函数连续地平衡在最大化进步和精确跟踪路径之间。本文介绍了三个关键组件,增强了在无人机竞赛中使用 MPC 的方法。首先,我们提供了安全保证,以约束和终止集的形式提供保障。安全集被设计为空间约束,在防止门碰撞的同时,仅允许在成本函数中进行时间优化。其次,我们通过残差项增加了现有的第一性原理动力学,并捕捉了从现实世界数据中获得的复杂空气动力学效应和推力。第三,我们使用了一种最先进的全球贝叶斯优化算法——Trust Region Bayesian Optimization (TuRBO)来根据基于 lap 时间最小化的稀疏奖励来调整 MPC 控制器的超参数。所提出的方法在类似 lap 时间内取得了与最佳状态下的 RL 相同的速度,并且在满足约束的情况下超越了最佳时间最优控制器。在仿真和现实世界里,我们的方法始终能够以 100% 的成功率防止门碰撞,并将无人机推向其物理极限,达到超过 80km/h 的速度。
https://arxiv.org/abs/2403.17551
Recently, machine learning-based semantic segmentation algorithms have demonstrated their potential to accurately segment regions and contours in medical images, allowing the precise location of anatomical structures and abnormalities. Although medical images are difficult to acquire and annotate, semi-supervised learning methods are efficient in dealing with the scarcity of labeled data. However, overfitting is almost inevitable due to the limited images for training. Furthermore, the intricate shapes of organs and lesions in medical images introduce additional complexity in different cases, preventing networks from acquiring a strong ability to generalize. To this end, we introduce a novel method called Scaling-up Mix with Multi-Class (SM2C). This method uses three strategies - scaling-up image size, multi-class mixing, and object shape jittering - to improve the ability to learn semantic features within medical images. By diversifying the shape of the segmentation objects and enriching the semantic information within each sample, the SM2C demonstrates its potential, especially in the training of unlabelled data. Extensive experiments demonstrate the effectiveness of the SM2C on three benchmark medical image segmentation datasets. The proposed framework shows significant improvements over state-of-the-art counterparts.
最近,基于机器学习的语义分割算法已经展示了在医学图像中准确分割区域和轮廓的潜力,使得解剖结构和异常情况的准确位置得以确定。尽管获取和标注医学图像具有挑战性,但半监督学习方法在处理数据稀疏的情况方面非常有效。然而,由于训练数据的有限性,过拟合几乎是不可避免的。此外,医学图像中器官和病变形状的复杂性增加了在不同情况下网络泛化能力的问题。为了实现这一目标,我们引入了一种名为“放大混合多类”(SM2C)的新方法。这种方法采用三种策略-放大图像尺寸、多类混合和对象形状扰动-以提高在医学图像中学习语义特征的能力。通过多样化分割对象的形状和丰富每个样本的语义信息,SM2C展示了其潜力,尤其是在未标注数据上的训练。大量实验证明,SM2C在三个医疗图像分割数据集上的效果优于现有方法。所提出的框架在现有技术水平上取得了显著的提高。
https://arxiv.org/abs/2403.16009
The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segmentation (MIS) datasets, where the processes of collection and fine-grained annotation are time-intensive and laborious. Recently, Unlearnable Examples (UEs) methods have shown the potential to protect images by adding invisible shortcuts. These shortcuts can prevent unauthorized deep neural networks from generalizing. However, existing UEs are designed for natural image classification and fail to protect MIS datasets imperceptibly as their protective perturbations are less learnable than important prior knowledge in MIS, e.g., contour and texture features. To this end, we propose an Unlearnable Medical image generation method, termed UMed. UMed integrates the prior knowledge of MIS by injecting contour- and texture-aware perturbations to protect images. Given that our target is to only poison features critical to MIS, UMed requires only minimal perturbations within the ROI and its contour to achieve greater imperceptibility (average PSNR is 50.03) and protective performance (clean average DSC degrades from 82.18% to 6.80%).
广泛可获取的公共医疗图像的普遍可用性极大地推动了各个研究领域和临床领域的进步。然而,关于未经授权的AI系统用于商业目的以及患者隐私保护的问题,许多机构犹豫不决,不敢分享他们的图像。这在医疗图像分割(MIS)数据集上尤为突出,因为这些数据的收集和细粒度注释过程是耗时且劳动密集的。最近,无可学习示例(UEs)方法通过添加隐形的短路来保护图像,表明了其潜力。这些短路可以防止未经授权的深度神经网络进行泛化。然而,现有的UE设计用于自然图像分类,无法在不引人察觉的情况下保护MIS数据,因为它们的保护扰动小于重要的先前知识,例如轮廓和纹理特征。因此,我们提出了一个无可学习医疗图像生成方法,称为UMed。UMed通过向图像注入轮廓和纹理感知扰动来保护MIS数据。鉴于我们的目标是只污染对MIS至关重要的特征,UMed在ROI及其轮廓内的扰动只需要最小程度即可实现更大的不可见性(平均PSNR为50.03)和保护性能(清洁平均DSC从82.18%下降到6.80%)。
https://arxiv.org/abs/2403.14250
This paper introduces CLIPSwarm, a new algorithm designed to automate the modeling of swarm drone formations based on natural language. The algorithm begins by enriching a provided word, to compose a text prompt that serves as input to an iterative approach to find the formation that best matches the provided word. The algorithm iteratively refines formations of robots to align with the textual description, employing different steps for "exploration" and "exploitation". Our framework is currently evaluated on simple formation targets, limited to contour shapes. A formation is visually represented through alpha-shape contours and the most representative color is automatically found for the input word. To measure the similarity between the description and the visual representation of the formation, we use CLIP [1], encoding text and images into vectors and assessing their similarity. Subsequently, the algorithm rearranges the formation to visually represent the word more effectively, within the given constraints of available drones. Control actions are then assigned to the drones, ensuring robotic behavior and collision-free movement. Experimental results demonstrate the system's efficacy in accurately modeling robot formations from natural language descriptions. The algorithm's versatility is showcased through the execution of drone shows in photorealistic simulation with varying shapes. We refer the reader to the supplementary video for a visual reference of the results.
本文介绍了CLIPSwarm算法,这是一种根据自然语言自动建模蚊群机群的新算法。算法首先通过丰富提供的单词,生成一个文本提示,作为迭代方法找到与提供单词最匹配的机群。然后,算法迭代地优化机器人的形式,使它们与文本描述对齐,采用不同的步骤进行“探索”和“利用”。我们的框架目前仅在简单的机群目标上进行评估,这些目标限制在轮廓形状上。通过alpha形状轮廓和自动找到输入单词最具有代表性的颜色,对机群的视觉表示进行表示。为了测量描述与视觉表示的相似度,我们使用CLIP[1],将文本和图像编码为向量并评估它们的相似性。然后,算法重新排列机群,在给定的无人机约束范围内更有效地视觉表示单词。最后,通过在无人机上分配控制动作,确保机器人的行为和无碰撞运动。实验结果证明了系统在准确建模自然语言描述的机器人集群方面的高效性。通过在photorealistic仿真中执行无人机演示,展示了算法的多样性。我们请读者参考补充视频,查看结果的视觉参考。
https://arxiv.org/abs/2403.13467
We introduce the Multi-Robot Connected Fermat Spiral (MCFS), a novel algorithmic framework for Multi-Robot Coverage Path Planning (MCPP) that adapts Connected Fermat Spiral (CFS) from the computer graphics community to multi-robot coordination for the first time. MCFS uniquely enables the orchestration of multiple robots to generate coverage paths that contour around arbitrarily shaped obstacles, a feature that is notably lacking in traditional methods. Our framework not only enhances area coverage and optimizes task performance, particularly in terms of makespan, for workspaces rich in irregular obstacles but also addresses the challenges of path continuity and curvature critical for non-holonomic robots by generating smooth paths without decomposing the workspace. MCFS solves MCPP by constructing a graph of isolines and transforming MCPP into a combinatorial optimization problem, aiming to minimize the makespan while covering all vertices. Our contributions include developing a unified CFS version for scalable and adaptable MCPP, extending it to MCPP with novel optimization techniques for cost reduction and path continuity and smoothness, and demonstrating through extensive experiments that MCFS outperforms existing MCPP methods in makespan, path curvature, coverage ratio, and overlapping ratio. Our research marks a significant step in MCPP, showcasing the fusion of computer graphics and automated planning principles to advance the capabilities of multi-robot systems in complex environments. Our code is available at this https URL.
我们介绍了一种名为多机器人连接费马螺旋(MCFS)的新算法框架,这是多机器人覆盖路径规划(MCPP)的第一个将连接费马螺旋(CFS)从计算机图形社区引入到多机器人协同的框架。MCFS独特地使得多个机器人协同生成围绕任意形状障碍物的覆盖路径,这是传统方法所缺乏的。我们的框架不仅提高了面积覆盖,优化了任务性能,尤其是在不规则障碍物的工场中,还解决了非对称机器人路径连续性和弯曲性的挑战,通过生成无分叉的路径来最小化总用时。MCFS通过构建孤立线并将其转化为组合优化问题来解决MCPP,旨在最小化总用时并覆盖所有顶点。我们的贡献包括开发可扩展和可适应的MCPP的统一CFS版本,引入新的优化技术用于成本降低和路径连续性和平滑性,并通过广泛的实验证明MCFS在总用时、路径弯曲度、覆盖比和重叠比方面优于现有MCPP方法。我们的研究标志着MCPP领域迈出了重要的一步,展示了计算机图形和自动规划原则的融合如何推动多机器人系统在复杂环境中的能力。我们的代码可在此处访问:https://www. researchgate.net/publication/331001922/figure/fig2/AS:7610688463543-155822208294251533?fileId=AS7610688463543-9236e8422e22a1f1!3528!161110!2!
https://arxiv.org/abs/2403.13311
The role of a motion planner is pivotal in quadrotor applications, yet existing methods often struggle to adapt to complex environments, limiting their ability to achieve fast, safe, and robust flight. In this letter, we introduce a performance-enhanced quadrotor motion planner designed for autonomous flight in complex environments including dense obstacles, dynamic obstacles, and unknown disturbances. The global planner generates an initial trajectory through kinodynamic path searching and refines it using B-spline trajectory optimization. Subsequently, the local planner takes into account the quadrotor dynamics, estimated disturbance, global reference trajectory, control cost, time cost, and safety constraints to generate real-time control inputs, utilizing the framework of model predictive contouring control. Both simulations and real-world experiments corroborate the heightened robustness, safety, and speed of the proposed motion planner. Additionally, our motion planner achieves flights at more than 6.8 m/s in a challenging and complex racing scenario.
运动规划器在四旋翼应用中的作用至关重要,然而现有的方法通常很难适应复杂的环境,限制了它们实现快速、安全和稳健飞行的能力。在本文中,我们介绍了一种用于自主飞行在复杂环境中的四旋翼运动规划器,包括密集障碍物、动态障碍物和未知干扰。全局规划器通过动力学路径搜索生成初始轨迹,并使用B-spline轨迹优化来优化。随后,局部规划器考虑了四旋翼动态、估计干扰、全局参考轨迹、控制成本、时间成本和安全约束,以生成实时控制输入,利用了模型预测控制方法框架。此外,通过仿真和实测,证实了所提出的运动规划器的提高的稳健性、安全性和速度。此外,我们的运动规划器在具有挑战性和复杂性的赛车场景中实现飞行速度超过6.8 m/s。
https://arxiv.org/abs/2403.12865
Ultrasound video-based breast lesion segmentation provides a valuable assistance in early breast lesion detection and treatment. However, existing works mainly focus on lesion segmentation based on ultrasound breast images which usually can not be adapted well to obtain desirable results on ultrasound videos. The main challenge for ultrasound video-based breast lesion segmentation is how to exploit the lesion cues of both intra-frame and inter-frame simultaneously. To address this problem, we propose a novel Spatial-Temporal Progressive Fusion Network (STPFNet) for video based breast lesion segmentation problem. The main aspects of the proposed STPFNet are threefold. First, we propose to adopt a unified network architecture to capture both spatial dependences within each ultrasound frame and temporal correlations between different frames together for ultrasound data representation. Second, we propose a new fusion module, termed Multi-Scale Feature Fusion (MSFF), to fuse spatial and temporal cues together for lesion detection. MSFF can help to determine the boundary contour of lesion region to overcome the issue of lesion boundary blurring. Third, we propose to exploit the segmentation result of previous frame as the prior knowledge to suppress the noisy background and learn more robust representation. In particular, we introduce a new publicly available ultrasound video breast lesion segmentation dataset, termed UVBLS200, which is specifically dedicated to breast lesion segmentation. It contains 200 videos, including 80 videos of benign lesions and 120 videos of malignant lesions. Experiments on the proposed dataset demonstrate that the proposed STPFNet achieves better breast lesion detection performance than state-of-the-art methods.
超声视频为基础的乳腺癌病变分割提供了一个有价值的帮助,在早期乳腺癌病变检测和治疗中。然而,现有的作品主要关注基于超声乳房图像的病变分割,这些方法通常不能很好地适应超声视频上的期望结果。超声视频为基础的乳腺癌病变分割的主要挑战是如何同时利用帧内和帧间的病变线索。为了解决这个问题,我们提出了一个名为STPFNet的超声视频乳腺癌病变分割网络。STPFNet的主要方面是三点。首先,我们提出了一种统一网络架构,以捕捉每个超声帧内的空间相关性以及不同帧之间的时序相关性,实现超声数据的表示。其次,我们提出了一种新的融合模块,称为多尺度特征融合(MSFF),用于将空间和时间线索一起融合用于病变检测。MSFF可以帮助确定病变区域的边界轮廓,克服病变边界模糊的问题。第三,我们利用前帧的分割结果作为先验知识来抑制噪声背景,并学习更鲁棒的表示。特别地,我们引入了一个新的公开可用的超声视频乳腺癌病变分割数据集,称为UVBLS200,该数据集专门用于乳腺癌病变分割。它包括200个视频,包括80个良性病变和120个恶性病变。对于提出的数据集的实验证明,与最先进的乳腺癌病变检测方法相比,STPFNet具有更好的性能。
https://arxiv.org/abs/2403.11699
Recently, circle representation has been introduced for medical imaging, designed specifically to enhance the detection of instance objects that are spherically shaped (e.g., cells, glomeruli, and nuclei). Given its outstanding effectiveness in instance detection, it is compelling to consider the application of circle representation for segmenting instance medical objects. In this study, we introduce CircleSnake, a simple end-to-end segmentation approach that utilizes circle contour deformation for segmenting ball-shaped medical objects at the instance level. The innovation of CircleSnake lies in these three areas: (1) It substitutes the complex bounding box-to-octagon contour transformation with a more consistent and rotation-invariant bounding circle-to-circle contour adaptation. This adaptation specifically targets ball-shaped medical objects. (2) The circle representation employed in CircleSnake significantly reduces the degrees of freedom to two, compared to eight in the octagon representation. This reduction enhances both the robustness of the segmentation performance and the rotational consistency of the method. (3) CircleSnake is the first end-to-end deep instance segmentation pipeline to incorporate circle representation, encompassing consistent circle detection, circle contour proposal, and circular convolution in a unified framework. This integration is achieved through the novel application of circular graph convolution within the context of circle detection and instance segmentation. In practical applications, such as the detection of glomeruli, nuclei, and eosinophils in pathological images, CircleSnake has demonstrated superior performance and greater rotation invariance when compared to benchmarks. The code has been made publicly available: this https URL.
近年来,在医学影像中引入了圆表示法,特别设计用于增强实例对象的检测(例如细胞、肾小球和核)。由于其在实例检测方面的出色效果,人们不禁考虑将圆表示法应用于实例分割。在这项研究中,我们引入了CircleSnake,一种简单的端到端实例分割方法,它利用圆轮廓变形在实例级别分割球形医疗物体。圆Snake的创新之处在于这三个方面:(1)它用更一致和旋转不变的圆轮廓到圆轮廓的适应取代了复杂的边界框到八边形轮廓的变换。这个适应特别针对球形医疗物体。(2)圆表示法在圆Snake中显著减少了自由度,与八边形表示法相比,减少了六个自由度。这种减少增强了分割性能的稳健性以及方法旋转一致性的提高。(3)圆Snake是第一个将圆表示法集成到端到端实例分割中的管道,涵盖了一致的圆检测、圆轮廓建议和环状卷积在一个统一的框架中。通过在圆检测和实例分割的背景下应用新颖的环形图卷积,实现了这一集成。在实际应用中,例如病理图像中检测肾小球、核和嗜碱性粒细胞,圆Snake已经表现出与基准测试相比的卓越性能和更高的旋转不变性。代码已公开发布:https:// this URL。
https://arxiv.org/abs/2403.11507
Accurately translating medical images across different modalities (e.g., CT to MRI) has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tasks, e.g., when leveraging masks from the input domain to develop a segmentation model with images translated to the output domain. To address these challenges, we propose ContourDiff, a novel framework that leverages domain-invariant anatomical contour representations of images. These representations are simple to extract from images, yet form precise spatial constraints on their anatomical content. We introduce a diffusion model that converts contour representations of images from arbitrary input domains into images in the output domain of interest. By applying the contour as a constraint at every diffusion sampling step, we ensure the preservation of anatomical content. We evaluate our method by training a segmentation model on images translated from CT to MRI with their original CT masks and testing its performance on real MRIs. Our method outperforms other unpaired image translation methods by a significant margin, furthermore without the need to access any input domain information during training.
准确地跨模态(例如,CT到MRI)翻译医学图像具有许多下游临床和机器学习应用。虽然为达到这一目标提出了几种方法,但它们通常在输出领域特征的感知质量方面优先于保留解剖准确性。然而,在翻译过程中保持解剖结构至关重要,对于许多任务来说,例如利用输入域的掩码开发具有输出域图像的分割模型,如此一来。为了应对这些挑战,我们提出了ContourDiff,一种利用图像域Invariant解剖轮廓表示的全新框架。这些表示方法从图像中提取起来很简便,同时对其解剖内容的空间约束非常明确。我们引入了一个扩散模型,将图像轮廓表示从任意输入域转换为感兴趣输出域中的图像。通过在扩散采样步骤中将轮廓作为约束,确保保留解剖内容。我们通过将轮廓作为一个约束在每一个扩散采样步骤上应用,来评估我们的方法。然后,我们将训练分割模型从CT到MRI的图像,并测试其在真实MRI上的性能。我们的方法在与其他未配对图像翻译方法的大幅优势相比,表现优异,而且不需要在训练过程中访问任何输入域信息。
https://arxiv.org/abs/2403.10786
Remote photoplethysmography (rPPG) technique extracts blood volume pulse (BVP) signals from subtle pixel changes in video frames. This study introduces rFaceNet, an advanced rPPG method that enhances the extraction of facial BVP signals with a focus on facial contours. rFaceNet integrates identity-specific facial contour information and eliminates redundant data. It efficiently extracts facial contours from temporally normalized frame inputs through a Temporal Compressor Unit (TCU) and steers the model focus to relevant facial regions by using the Cross-Task Feature Combiner (CTFC). Through elaborate training, the quality and interpretability of facial physiological signals extracted by rFaceNet are greatly improved compared to previous methods. Moreover, our novel approach demonstrates superior performance than SOTA methods in various heart rate estimation benchmarks.
远程光脉搏计(rPPG)技术从视频帧中微小的像素变化中提取血容量脉冲(BVP)信号。本研究引入了rFaceNet,一种专注于面部轮廓的先进rPPG方法,通过增强面部BVP信号的提取来提高。rFaceNet整合了与身份相关的面部轮廓信息,并消除了冗余数据。它通过Temporal Compressor Unit(TCU)高效地从时间归一化的帧输入中提取面部轮廓。通过详细的训练,rFaceNet提取面部生理信号的质量和对解释性的提高与以前的方法相比有了很大的改善。此外,我们的新方法在各种心率估计基准测试中的性能优于目前的最优方法。
https://arxiv.org/abs/2403.09034