Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simultaneously label CMs and capillaries, presenting fewer challenges in background staining. However, subsequent image analysis always requires manual work in identifying and segmenting CMs and capillaries. Here, we developed an image analysis tool, AutoQC, to automatically identify and segment CMs and capillaries in immunofluorescence images of collagen type IV, a predominant basement membrane protein within the myocardium. In addition, commonly used capillarization-related measurements can be derived from segmentation masks. AutoQC features a weakly supervised instance segmentation algorithm by leveraging the power of a pre-trained segmentation model via prompt engineering. AutoQC outperformed YOLOv8-Seg, a state-of-the-art instance segmentation model, in both instance segmentation and capillarization assessment. Furthermore, the training of AutoQC required only a small dataset with bounding box annotations instead of pixel-wise annotations, leading to a reduced workload during network training. AutoQC provides an automated solution for quantifying cardiac capillarization in basement-membrane-immunostained myocardial slices, eliminating the need for manual image analysis once it is trained.
降低心肌细胞内毛细血管密度已被报道为与各种心脏疾病相关的病理学特征。对心肌切片进行定量评估通常涉及对心肌细胞(CMs)和毛细血管的免疫双染色。相比之下,仅对基膜成分进行免疫染色是一种简单的方法,可同时标记CMs和毛细血管,且背景染色较少挑战。然而,后续图像分析始终需要手动操作来识别和分割CMs和毛细血管。在这里,我们开发了一个图像分析工具,AutoQC,用于自动识别和分割胶原类型IV在免疫荧光图像中的CMs和毛细血管。此外,常用的毛细管化相关测量可以从分割掩码中得到。AutoQC通过利用预训练分割模型的力量来进行提示工程,具有弱监督的实例分割算法。在实例分割和毛细管化评估方面,AutoQC超过了最先进的YOLOv8-Seg实例分割模型。此外,AutoQC在训练过程中只需要一个带有边界框注释的小数据集,而不是像素级注释,从而在网络训练过程中降低了工作量。AutoQC为量化基膜免疫荧光切片中的心肌毛细管化提供了一个自动解决方案,一旦训练完成,无需进行手动图像分析。
https://arxiv.org/abs/2311.18173
Cell segmentation in histopathological images plays a crucial role in understanding, diagnosing, and treating many diseases. However, data annotation for this is expensive since there can be a large number of cells per image, and expert pathologists are needed for labelling images. Instead, our paper focuses on using weak supervision -- annotation from related tasks -- to induce a segmenter. Recent foundation models, such as Segment Anything (SAM), can use prompts to leverage additional supervision during inference. SAM has performed remarkably well in natural image segmentation tasks; however, its applicability to cell segmentation has not been explored. In response, we investigate guiding the prompting procedure in SAM for weakly supervised cell segmentation when only bounding box supervision is available. We develop two workflows: (1) an object detector's output as a test-time prompt to SAM (D-SAM), and (2) SAM as pseudo mask generator over training data to train a standalone segmentation model (SAM-S). On finding that both workflows have some complementary strengths, we develop an integer programming-based approach to reconcile the two sets of segmentation masks, achieving yet higher performance. We experiment on three publicly available cell segmentation datasets namely, ConSep, MoNuSeg, and TNBC, and find that all SAM-based solutions hugely outperform existing weakly supervised image segmentation models, obtaining 9-15 pt Dice gains.
在组织学图像中的细胞分割在理解、诊断和治疗许多疾病方面起着关键作用。然而,数据注释在这方面是昂贵的,因为图像中可能存在大量细胞,需要专家病理学家进行标注。相反,我们的论文重点探讨了使用弱监督--来自相关任务的注释--诱导分割器。最近的基础模型,如Segment Anything(SAM),可以在推理过程中利用提示进行额外监督。SAM在自然图像分割任务中表现出色;然而,在细胞分割方面的应用尚未被探索。为了回应这个问题,我们研究了在仅限于边界框监督的情况下,指导SAM进行弱监督细胞分割的提示流程。我们开发了两个工作流程:(1)对象检测器的输出作为SAM的测试时间提示(D-SAM),和(2)SAM作为伪掩码生成器来训练一个独立的分割模型(SAM-S)。当发现这两个工作流程具有某些互补的优点时,我们开发了一种基于整数规划的方法来解决这两套分割掩码,实现更高的性能。我们在三个公开可用的细胞分割数据集(ConSep、MoNuSeg和TNBC)上进行实验,发现所有基于SAM的解决方案都大大优于现有的弱监督图像分割模型,获得了9-15个pt的Dice分数增长。
https://arxiv.org/abs/2311.17960
Generating vivid and emotional 3D co-speech gestures is crucial for virtual avatar animation in human-machine interaction applications. While the existing methods enable generating the gestures to follow a single emotion label, they overlook that long gesture sequence modeling with emotion transition is more practical in real scenes. In addition, the lack of large-scale available datasets with emotional transition speech and corresponding 3D human gestures also limits the addressing of this task. To fulfill this goal, we first incorporate the ChatGPT-4 and an audio inpainting approach to construct the high-fidelity emotion transition human speeches. Considering obtaining the realistic 3D pose annotations corresponding to the dynamically inpainted emotion transition audio is extremely difficult, we propose a novel weakly supervised training strategy to encourage authority gesture transitions. Specifically, to enhance the coordination of transition gestures w.r.t different emotional ones, we model the temporal association representation between two different emotional gesture sequences as style guidance and infuse it into the transition generation. We further devise an emotion mixture mechanism that provides weak supervision based on a learnable mixed emotion label for transition gestures. Last, we present a keyframe sampler to supply effective initial posture cues in long sequences, enabling us to generate diverse gestures. Extensive experiments demonstrate that our method outperforms the state-of-the-art models constructed by adapting single emotion-conditioned counterparts on our newly defined emotion transition task and datasets.
生成生动且情感丰富的3D共同交流手势对于人机交互应用中的虚拟 avatar 动画至关重要。虽然现有的方法能够生成遵循单个情感标签的统一手势,但它们忽视了在真实场景中使用情感过渡的长手势序列建模更为实用。此外,缺乏大规模可用的情感过渡语音数据和相应的人体手势数据,也会限制解决此任务的规模。为了实现这一目标,我们首先将 ChatGPT-4 和一个音频修复方法结合使用构建高保真的情感过渡人类对话。考虑到获得对应于动态修复情感过渡音频的真实3D姿势标注是非常困难的,我们提出了一个新的弱监督训练策略,以鼓励权威手势转移。具体来说,为了增强转移手势相对于不同情感的协调性,我们建模了两个不同情感手势序列之间的时间关联表示为风格指导,并将其融入转移生成。我们进一步设计了一个情感混合机制,基于可学习混合情感标签为转移手势提供弱监督。最后,我们提出了一个关键帧采样器,为长序列提供有效的初始姿势提示,使我们能够生成多样化的手势。大量实验证明,我们的方法在将单个情感条件的类似模型应用于我们定义的情感过渡任务和数据集上 construct 的最先进模型方面超过了现有水平。
https://arxiv.org/abs/2311.17532
We present Decomposer, a semi-supervised reconstruction model that decomposes distorted image sequences into their fundamental building blocks - the original image and the applied augmentations, i.e., shadow, light, and occlusions. To solve this problem, we use the SIDAR dataset that provides a large number of distorted image sequences: each sequence contains images with shadows, lighting, and occlusions applied to an undistorted version. Each distortion changes the original signal in different ways, e.g., additive or multiplicative noise. We propose a transformer-based model to explicitly learn this decomposition. The sequential model uses 3D Swin-Transformers for spatio-temporal encoding and 3D U-Nets as prediction heads for individual parts of the decomposition. We demonstrate that by separately pre-training our model on weakly supervised pseudo labels, we can steer our model to optimize for our ambiguous problem definition and learn to differentiate between the different image distortions.
我们提出了Decomposer模型,一种半监督的图像重建模型,可以将扭曲的图像序列分解为其基本构建块——原始图像和应用的增强(即阴影、光线和遮挡)。为了解决这个问题,我们使用了SIDAR数据集,该数据集提供了大量扭曲的图像序列:每个序列包含应用于未扭曲版本的图像的阴影、光线和遮挡。每种扭曲以不同的方式改变原始信号,例如加性或乘性噪声。我们提出了一个基于Transformer的模型,以明确学习这种分解。序列模型使用3D Swin-Transformers进行空间时间编码,3D U-Net作为预测头用于个体分解部分。我们证明了,通过在弱监督伪标签上分别预训练我们的模型,我们可以将模型引导到优化我们模糊的问题定义,并学会区分不同图像扭曲。
https://arxiv.org/abs/2311.16829
Developing end-to-end models for long-video action understanding tasks presents significant computational and memory challenges. Existing works generally build models on long-video features extracted by off-the-shelf action recognition models, which are trained on short-video datasets in different domains, making the extracted features suffer domain discrepancy. To avoid this, action recognition models can be end-to-end trained on clips, which are trimmed from long videos and labeled using action interval annotations. Such fully supervised annotations are expensive to collect. Thus, a weakly supervised method is needed for long-video action understanding at scale. Under the weak supervision setting, action labels are provided for the whole video without precise start and end times of the action clip. To this end, we propose an AdaFocus framework. AdaFocus estimates the spike-actionness and temporal positions of actions, enabling it to adaptively focus on action clips that facilitate better training without the need for precise annotations. Experiments on three long-video datasets show its effectiveness. Remarkably, on two of datasets, models trained with AdaFocus under weak supervision outperform those trained under full supervision. Furthermore, we form a weakly supervised feature extraction pipeline with our AdaFocus, which enables significant improvements on three long-video action understanding tasks.
开发端到端模型对于长视频动作理解任务具有显著的计算和内存挑战。现有的工作通常基于从通用动作识别模型提取的长期视频特征构建模型,这些模型在不同的领域使用短视频数据进行训练,导致提取的特征存在领域差异。为了避免这种现象,可以使用剪辑进行端到端动作识别训练,这些剪辑是从长视频中剪辑出来的,并使用动作间隔注释进行标注。这样的完全监督注释成本高昂。因此,在大型规模下,需要使用弱监督方法来进行长视频动作理解。在弱监督设置下,为整个视频提供动作标签,而不仅仅是动作片段的准确开始和结束时间。为此,我们提出了一个AdaFocus框架。AdaFocus估计动作的尖峰动作和时间位置,使得它可以适应性地关注有助于更好训练的动作片段,而无需进行精确的注释。在三个大型长视频数据集上的实验表明,它的效果是有效的。值得注意的是,在两个数据集上,使用弱监督训练的模型性能优于使用完整监督训练的模型。此外,我们还与AdaFocus一起构建了一个弱监督特征提取管道,可以在三个长视频动作理解任务上实现显著的改进。
https://arxiv.org/abs/2311.17118
In weakly supervised video anomaly detection (WVAD), where only video-level labels indicating the presence or absence of abnormal events are available, the primary challenge arises from the inherent ambiguity in temporal annotations of abnormal occurrences. Inspired by the statistical insight that temporal features of abnormal events often exhibit outlier characteristics, we propose a novel method, BN-WVAD, which incorporates BatchNorm into WVAD. In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos. The proposed DFM criterion is also discriminative for anomaly recognition and more resilient to label noise, serving as the additional anomaly score to amend the prediction of the anomaly classifier that is susceptible to noisy labels. Moreover, a batch-level selection strategy is devised to filter more abnormal snippets in videos where more abnormal events occur. The proposed BN-WVAD model demonstrates state-of-the-art performance on UCF-Crime with an AUC of 87.24%, and XD-Violence, where AP reaches up to 84.93%. Our code implementation is accessible at this https URL.
在弱监督的视频异常检测(WVAD)中,仅存在视频级别的异常事件表示,主要挑战来自异常发生的时间戳的固有不确定性。受到异常事件的时间戳特征经常表现出异常特征的统计洞察启发,我们提出了一个新的方法,称为BN-WVAD,它将BatchNorm集成到WVAD中。在BN-WVAD中,我们利用BatchNorm的均值向量(DFM)的差异作为可靠的异常指标来辨别异常视频中的潜在异常片段。所提出的DFM标准差 criterion 也是异常识别的可靠指标,对标签噪声具有鲁棒性,作为对容易受到噪音标签影响异常分类器的预测的额外异常分数。此外,我们还设计了一种批量级别的选择策略,以在发生更多异常事件的视频中过滤更多的异常片段。所提出的BN-WVAD模型在UCF-Crime上的AUC达到87.24%,在XD-Violence上的AP达到84.93%。我们的代码实现可以在这个链接https://进行访问。
https://arxiv.org/abs/2311.15367
This article presents a weakly supervised machine learning method, which we call DAS-N2N, for suppressing strong random noise in distributed acoustic sensing (DAS) recordings. DAS-N2N requires no manually produced labels (i.e., pre-determined examples of clean event signals or sections of noise) for training and aims to map random noise processes to a chosen summary statistic, such as the distribution mean, median or mode, whilst retaining the true underlying signal. This is achieved by splicing (joining together) two fibres hosted within a single optical cable, recording two noisy copies of the same underlying signal corrupted by different independent realizations of random observational noise. A deep learning model can then be trained using only these two noisy copies of the data to produce a near fully-denoised copy. Once the model is trained, only noisy data from a single fibre is required. Using a dataset from a DAS array deployed on the surface of the Rutford Ice Stream in Antarctica, we demonstrate that DAS-N2N greatly suppresses incoherent noise and enhances the signal-to-noise ratios (SNR) of natural microseismic icequake events. We further show that this approach is inherently more efficient and effective than standard stop/pass band and white noise (e.g., Wiener) filtering routines, as well as a comparable self-supervised learning method based on masking individual DAS channels. Our preferred model for this task is lightweight, processing 30 seconds of data recorded at a sampling frequency of 1000 Hz over 985 channels (approx. 1 km of fiber) in $<$1 s. Due to the high noise levels in DAS recordings, efficient data-driven denoising methods, such as DAS-N2N, will prove essential to time-critical DAS earthquake detection, particularly in the case of microseismic monitoring.
这篇文章介绍了一种弱监督的机器学习方法,我们称之为DAS-N2N,用于抑制分布式声感(DAS)录音中的强随机噪声。DAS-N2N不需要手动生成的标签(即预先确定的干净事件信号或噪声段的例子)进行训练,旨在将随机噪声过程映射到选择的总结统计量,如分布均值、中位数或模式,同时保留真实的底层信号。这是通过在同一光纤内连接(拼接)两个纤维实现的,并记录两个由不同独立随机观测噪声污染的相同底层信号的噪声副本。然后,使用这两个噪音副本训练一个深度学习模型,产生几乎完全去噪的副本。 一旦模型训练完成,只需要从单个光纤的噪音数据。使用部署在亚极地罗斯冰流表面的DAS阵列的数据库,我们在南极洲展示了DAS-N2N极大地抑制不相关噪声并增强了自然微观地震事件的信号-噪声比(SNR)。我们还进一步表明,这种方法比标准的停/过带白噪声(如Wiener)滤波方法和基于遮盖单个DAS通道的自监督学习方法更有效和高效。我们为这项任务首选的模型是轻量级的,在1秒内处理30秒的采样频率为1000 Hz的数据。由于DAS录音的高噪声水平,高效的数据驱动去噪方法(如DAS-N2N)将为时间关键的DAS地震检测(尤其是在微地震监测方面)提供必要的重要性。
https://arxiv.org/abs/2304.08120
Hypoxia occurs when tumour cells outgrow their blood supply, leading to regions of low oxygen levels within the tumour. Calculating hypoxia levels can be an important step in understanding the biology of tumours, their clinical progression and response to treatment. This study demonstrates a novel application of deep learning to evaluate hypoxia in the context of breast cancer histomorphology. More precisely, we show that Weakly Supervised Deep Learning (WSDL) models can accurately detect hypoxia associated features in routine Hematoxylin and Eosin (H&E) whole slide images (WSI). We trained and evaluated a deep Multiple Instance Learning model on tiles from WSI H&E tissue from breast cancer primary sites (n=240) obtaining on average an AUC of 0.87 on a left-out test set. We also showed significant differences between features of hypoxic and normoxic tissue regions as distinguished by the WSDL models. Such DL hypoxia H&E WSI detection models could potentially be extended to other tumour types and easily integrated into the pathology workflow without requiring additional costly assays.
低氧血症是指肿瘤细胞生长过度,超过其血管供应,导致肿瘤内低氧水平区域的出现。计算低氧血症水平可能是在理解肿瘤生物学、临床进展和治疗反应方面的重要一步。这项研究将深度学习应用于乳腺癌形态学评估中评估低氧血症。具体来说,我们证明了弱监督深度学习(WSDL)模型可以准确检测到常规Hematoxylin和Eosin(H&E) whole slide images (WSI) 中的低氧血症相关特征。我们用来自乳腺癌原发站点(n=240)的WSI H&E组织训练和评估了一个深度多实例学习模型,平均在左外测试集上的AUC为0.87。我们还证明了WSDL模型区分低氧血症和正常组织区域的特征具有显著性差异。这种DL低氧血症H&E WSI检测模型可以扩展到其他肿瘤类型,并且可以轻松地集成到病理工作流程中,而不需要进行昂贵的检测。
https://arxiv.org/abs/2311.12601
Deep learning is revolutionising pathology, offering novel opportunities in disease prognosis and personalised treatment. Historically, stain normalisation has been a crucial preprocessing step in computational pathology pipelines, and persists into the deep learning era. Yet, with the emergence of feature extractors trained using self-supervised learning (SSL) on diverse pathology datasets, we call this practice into question. In an empirical evaluation of publicly available feature extractors, we find that omitting stain normalisation and image augmentations does not compromise downstream performance, while incurring substantial savings in memory and compute. Further, we show that the top-performing feature extractors are remarkably robust to variations in stain and augmentations like rotation in their latent space. Contrary to previous patch-level benchmarking studies, our approach emphasises clinical relevance by focusing on slide-level prediction tasks in a weakly supervised setting with external validation cohorts. This work represents the most comprehensive robustness evaluation of public pathology SSL feature extractors to date, involving more than 6,000 training runs across nine tasks, five datasets, three downstream architectures, and various preprocessing setups. Our findings stand to streamline digital pathology workflows by minimising preprocessing needs and informing the selection of feature extractors.
深度学习正在改变病理学,为疾病预后和个性化治疗提供了新的机会。历史上,染色标准化是计算病理学流程中的一个关键预处理步骤,并且在深度学习时代仍然存在。然而,随着使用自监督学习(SSL)对不同病理学数据集进行特征提取器的出现,我们开始质疑这种做法。在公开可用的特征提取器的实证评估中,我们发现,省略染色标准化和图像增强不会影响下游性能,而会在内存和计算上产生大量节省。此外,我们还证明了顶级特征提取器对诸如在它们潜在空间中旋转的染色和增强的变异性非常稳健。与以前基于补丁级别的基准研究不同,我们的方法将临床相关性聚焦在弱监督设置的滑动级别预测任务上,并通过外部验证数据集进行验证。这项工作代表了迄今为止对公共病理学SSL特征提取器最全面的稳健性评估,涉及九个任务、五个数据集、三个下游架构和各种预处理设置的超过6,000个训练运行。我们的发现有望通过降低预处理需求并指导特征提取器的选择来简化数字病理学工作流程。
https://arxiv.org/abs/2311.11772
This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performance is validated against manually marked results, demonstrating that the proposed algorithm significantly enhances the accuracy of fault identification across various equipment types.
本研究提出了一种弱监督方法,用于识别变电站设备的红外图像中的故障。它利用了Faster RCNN模型进行设备识别,通过修改模型的网络结构和参数来提高检测准确性。该方法通过分析由检测机器人捕获的变电站设备的红外图像来举例说明。性能通过手动标记的结果进行验证,证明了所提出的算法在各种设备类型上的故障识别准确性得到了显著提高。
https://arxiv.org/abs/2311.11214
Breast cancer diagnosis challenges both patients and clinicians, with early detection being crucial for effective treatment. Ultrasound imaging plays a key role in this, but its utility is hampered by the need for precise lesion segmentation-a task that is both time-consuming and labor-intensive. To address these challenges, we propose a new framework: a morphology-enhanced, Class Activation Map (CAM)-guided model, which is optimized using a computer vision foundation model known as SAM. This innovative framework is specifically designed for weakly supervised lesion segmentation in early-stage breast ultrasound images. Our approach uniquely leverages image-level annotations, which removes the requirement for detailed pixel-level annotation. Initially, we perform a preliminary segmentation using breast lesion morphology knowledge. Following this, we accurately localize lesions by extracting semantic information through a CAM-based heatmap. These two elements are then fused together, serving as a prompt to guide the SAM in performing refined segmentation. Subsequently, post-processing techniques are employed to rectify topological errors made by the SAM. Our method not only simplifies the segmentation process but also attains accuracy comparable to supervised learning methods that rely on pixel-level annotation. Our framework achieves a Dice score of 74.39% on the test set, demonstrating compareable performance with supervised learning methods. Additionally, it outperforms a supervised learning model, in terms of the Hausdorff distance, scoring 24.27 compared to Deeplabv3+'s 32.22. These experimental results showcase its feasibility and superior performance in integrating weakly supervised learning with SAM. The code is made available at: this https URL.
乳腺癌的诊断对患者和临床医生都具有挑战性,早期的检测对于有效的治疗至关重要。超声成像在这一点上发挥着关键作用,但需要对病灶进行精确分割的任务却耗时且劳动密集。为了应对这些挑战,我们提出了一个新的框架:一个基于形态学增强和分类激活图(CAM)指导的模型,该模型使用名为SAM的计算机视觉基础模型进行优化。这个创新框架特别针对早期乳腺癌超声图像中的弱监督分割。我们的方法独特地利用了图像级的注释,取消了详细像素级别注释的需求。首先,我们使用乳腺癌病灶形态学知识进行初步分割。接下来,通过基于CAM的热图提取语义信息,准确地将病灶定位。然后将这两个元素融合在一起,作为指导SAM进行精细化分割的提示。随后,采用后处理技术修复SAM造成的拓扑错误。我们的方法不仅简化了分割过程,而且与依赖像素级别注释的监督学习方法相当,其Dice分数为74.39%,证明了与监督学习方法的可比性能。此外,它在哈夫曼距离方面优于监督学习模型,评分24.27,而Deeplabv3+的得分是32.22。这些实验结果展示了将弱监督学习与SAM相结合的可行性和卓越性能。代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2311.11176
In surgical procedures, correct instrument counting is essential. Instance segmentation is a location method that locates not only an object's bounding box but also each pixel's specific details. However, obtaining mask-level annotations is labor-intensive in instance segmentation. To address this issue, we propose a novel yet effective weakly-supervised surgical instrument instance segmentation approach, named Point-based Weakly-supervised Instance Segmentation (PWISeg). PWISeg adopts an FCN-based architecture with point-to-box and point-to-mask branches to model the relationships between feature points and bounding boxes, as well as feature points and segmentation masks on FPN, accomplishing instrument detection and segmentation jointly in a single model. Since mask level annotations are hard to available in the real world, for point-to-mask training, we introduce an unsupervised projection loss, utilizing the projected relation between predicted masks and bboxes as supervision signal. On the other hand, we annotate a few pixels as the key pixel for each instrument. Based on this, we further propose a key pixel association loss and a key pixel distribution loss, driving the point-to-mask branch to generate more accurate segmentation predictions. To comprehensively evaluate this task, we unveil a novel surgical instrument dataset with manual annotations, setting up a benchmark for further research. Our comprehensive research trial validated the superior performance of our PWISeg. The results show that the accuracy of surgical instrument segmentation is improved, surpassing most methods of instance segmentation via weakly supervised bounding boxes. This improvement is consistently observed in our proposed dataset and when applied to the public HOSPI-Tools dataset.
在外科手术中,正确计数器械是至关重要的。实例分割是一种定位方法,不仅找到了对象的边界框,还找到了每个像素的具体细节。然而,在实例分割中获取 mask 级别的注释是劳动密集型的工作。为解决这个问题,我们提出了一个新颖而有效的弱监督手术器械实例分割方法,名为基于点的弱监督实例分割(PWISeg)。PWISeg 采用基于 FCN 的架构,包括点对盒和点对 mask 分支,以建模特征点与边界框以及特征点与分割掩码之间的关系,从而在单个模型中实现器械检测和分割。 由于真实世界中 mask 级别的注释很难获得,为了实现点对掩码的训练,我们引入了一种无监督投影损失,利用预测掩码与边界框之间的关系作为监督信号。另一方面,我们对每个器械的几个像素进行注释。基于这一点,我们进一步提出了关键像素关联损失和关键像素分布损失,推动点对掩码分支产生更精确的分割预测。 为了全面评估这项任务,我们揭示了一个人工标注的新的手术器械数据集,为进一步的研究设置了一个基准。我们的全面研究试验验证了 PWISeg 的优越性能。结果表明,手术器械分割的准确性得到了提高,超过了大多数通过弱监督边界框的实例分割方法。这种提高在我们的数据集和公开的 HOSPI-Tools 数据集上都有观察到。
https://arxiv.org/abs/2311.09819
Fully supervised change detection methods have achieved significant advancements in performance, yet they depend severely on acquiring costly pixel-level labels. Considering that the patch-level annotations also contain abundant information corresponding to both changed and unchanged objects in bi-temporal images, an intuitive solution is to segment the changes with patch-level annotations. How to capture the semantic variations associated with the changed and unchanged regions from the patch-level annotations to obtain promising change results is the critical challenge for the weakly supervised change detection task. In this paper, we propose a memory-supported transformer (MS-Former), a novel framework consisting of a bi-directional attention block (BAB) and a patch-level supervision scheme (PSS) tailored for weakly supervised change detection with patch-level annotations. More specifically, the BAM captures contexts associated with the changed and unchanged regions from the temporal difference features to construct informative prototypes stored in the memory bank. On the other hand, the BAM extracts useful information from the prototypes as supplementary contexts to enhance the temporal difference features, thereby better distinguishing changed and unchanged regions. After that, the PSS guides the network learning valuable knowledge from the patch-level annotations, thus further elevating the performance. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method in the change detection task. The demo code for our work will be publicly available at \url{this https URL}.
完全监督的更改检测方法在性能上取得了显著的进步,但它们严重依赖于获取昂贵的像素级别标签。考虑到补丁级别注释也包含与生物时间图像中更改和未更改对象相关的丰富信息,一个直观的解决方案是对补丁级别注释进行分割。如何从补丁级别注释中捕获与更改和未更改区域相关的语义变化以获得有前景的更改结果,是弱监督更改检测任务的关键挑战。在本文中,我们提出了一个支持记忆的Transformer(MS-Former),一种新型的框架,由双向注意力块(BAB)和一个补丁级别监督方案(PSS)组成,用于弱监督更改检测和补丁级别注释。具体来说,BAM从时间差特征中捕获与更改和未更改区域相关的上下文,从而在内存银行中构建有信息量的原型。另一方面,BAM从原型中提取有用的信息作为附加上下文来增强时间差特征,从而更好地区分更改和未更改区域。然后,PSS引导网络从补丁级别注释中学习有价值的信息,从而进一步提高性能。在三个基准数据集上的实验结果证实了我们在更改检测任务中所提出的方法的有效性。我们的工作演示代码将在\url{这个链接}上公开发布。
https://arxiv.org/abs/2311.09726
Chest X-Ray (CXR) examination is a common method for assessing thoracic diseases in clinical applications. While recent advances in deep learning have enhanced the significance of visual analysis for CXR anomaly detection, current methods often miss key cues in anomaly images crucial for identifying disease regions, as they predominantly rely on unsupervised training with normal images. This letter focuses on a more practical setup in which few-shot anomaly images with only image-level labels are available during training. For this purpose, we propose WSCXR, a weakly supervised anomaly detection framework for CXR. WSCXR firstly constructs sets of normal and anomaly image features respectively. It then refines the anomaly image features by eliminating normal region features through anomaly feature mining, thus fully leveraging the scarce yet crucial features of diseased areas. Additionally, WSCXR employs a linear mixing strategy to augment the anomaly features, facilitating the training of anomaly detector with few-shot anomaly images. Experiments on two CXR datasets demonstrate the effectiveness of our approach.
肺X光(CXR)检查是临床应用中评估胸部疾病的一种常见方法。虽然最近在深度学习方面的进步增强了对于CXR异常检测的视觉分析的重要性,但现有方法通常会忽略对于确定疾病区域至关重要的关键异常图像线索,因为它们主要依赖于有监督的训练模式,以正常图像为辅助。本文重点关注在训练过程中存在几张图像级标签的弱监督异常检测设置。为此,我们提出了WSCXR,一种弱监督的CXR异常检测框架。WSCXR首先分别构建了正常和异常图像的特征集。然后,通过异常特征挖掘来消除正常区域特征,从而充分利用疾病区域稀疏而关键的特征。此外,WSCXR采用线性混合策略来增强异常特征,从而帮助通过几张弱监督的异常图像来训练异常检测器。在两个CXR数据集上的实验结果表明,我们的方法的有效性得到了验证。
https://arxiv.org/abs/2311.09642
Vast amounts of astronomical photometric data are generated from various projects, requiring significant efforts to identify variable stars and other object classes. In light of this, a general, widely applicable classification framework would simplify the task of designing custom classifiers. We present a novel deep learning framework for classifying light curves using a weakly supervised object detection model. Our framework identifies the optimal windows for both light curves and power spectra automatically, and zooms in on their corresponding data. This allows for automatic feature extraction from both time and frequency domains, enabling our model to handle data across different scales and sampling intervals. We train our model on datasets obtained from both space-based and ground-based multi-band observations of variable stars and transients. We achieve an accuracy of 87% for combined variables and transient events, which is comparable to the performance of previous feature-based models. Our trained model can be utilized directly to other missions, such as ASAS-SN, without requiring any retraining or fine-tuning. To address known issues with miscalibrated predictive probabilities, we apply conformal prediction to generate robust predictive sets that guarantee true label coverage with a given probability. Additionally, we incorporate various anomaly detection algorithms to empower our model with the ability to identify out-of-distribution objects. Our framework is implemented in the Deep-LC toolkit, which is an open-source Python package hosted on Github and PyPI.
大量的天体光度测量数据来自各种项目,需要大量努力来识别变星和其他物体类别。鉴于这一点,我们提出了一个通用的、适用于各种任务的分类框架来简化设计自定义分类器的任务。我们介绍了一个使用弱监督物体检测模型进行分类的新型深度学习框架。我们的框架可以自动识别光曲线和功率谱的最优窗口,并将其聚焦在相应的数据上。这允许从时间和频率域自动提取特征,使我们的模型能够处理不同规模和采样间隔的数据。我们在变星和 transient 观测数据的基础上进行训练。我们获得了 87% 的综合变星和 transient 事件的准确率,这与之前基于特征的模型的性能相似。经过训练的模型可以直接用于其他任务,例如 ASAS-SN,而不需要进行重新训练或微调。为了应对预测概率不准确的已知问题,我们应用对数预测生成具有给定概率保证真实标签覆盖的稳健预测集。此外,我们还采用各种异常检测算法增强模型的能力,使其能够识别离群物体。我们的框架实现于 Deep-LC 工具包,这是一个由 Github 和 PyPI 托管的开放源代码 Python 包。
https://arxiv.org/abs/2311.08080
Teeth segmentation is an essential task in dental image analysis for accurate diagnosis and treatment planning. While supervised deep learning methods can be utilized for teeth segmentation, they often require extensive manual annotation of segmentation masks, which is time-consuming and costly. In this research, we propose a weakly supervised approach for teeth segmentation that reduces the need for manual annotation. Our method utilizes the output heatmaps and intermediate feature maps from a keypoint detection network to guide the segmentation process. We introduce the TriDental dataset, consisting of 3000 oral cavity images annotated with teeth keypoints, to train a teeth keypoint detection network. We combine feature maps from different layers of the keypoint detection network, enabling accurate teeth segmentation without explicit segmentation annotations. The detected keypoints are also used for further refinement of the segmentation masks. Experimental results on the TriDental dataset demonstrate the superiority of our approach in terms of accuracy and robustness compared to state-of-the-art segmentation methods. Our method offers a cost-effective and efficient solution for teeth segmentation in real-world dental applications, eliminating the need for extensive manual annotation efforts.
牙齿分割是口腔图像分析中准确诊断和治疗计划的关键任务。虽然监督式深度学习方法可以用于牙齿分割,但它们通常需要大量的手动注释分割掩码,这费时且昂贵。在这项研究中,我们提出了一种弱监督的牙齿分割方法,减少了手动注释的需求。我们的方法利用了关键点检测网络的输出热图和中间特征图来指导分割过程。我们引入了TriDental数据集,由3000张带有牙齿关键点的口腔腔图像组成,来训练牙齿关键点检测网络。我们将特征图从不同层次的关键点检测网络中结合,实现了没有明确分割注释的准确牙齿分割。检测到的关键点还用于进一步精炼分割掩码。TriDental数据集上的实验结果表明,与最先进的分割方法相比,我们的方法在准确性和鲁棒性方面具有优越性。这种方法为口腔应用中的牙齿分割提供了一种经济且高效的成本效益解决方案,消除了需要大量手动注释的必要性。
https://arxiv.org/abs/2311.07398
Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models.
因果分解具有很大的潜在价值,可以捕捉复杂的情况。然而,目前缺乏实际和高效的实现方法。已经证明,大多数无监督的分解方法在没有额外信息的情况下无法产生可识别的结果,通常导致随机分解输出。因此,大多数现有的分解模型都是弱监督的,提供关于内在因素的信息,这导致费用过高。因此,我们提出了一个名为SCADI(自监督因果分解)的新模型,该模型使模型能够发现语义因素并学习它们之间的因果关系,而无需任何监督。该模型将伪装结构因果模型(SCM)与伪标签生成器相结合,旨在为自监督因果分解模型提供新的方向。
https://arxiv.org/abs/2311.06567
With the surge in available data from various modalities, there is a growing need to bridge the gap between different data types. In this work, we introduce a novel approach to learn cross-modal representations between image data and molecular representations for drug discovery. We propose EMM and IMM, two innovative loss functions built on top of CLIP that leverage weak supervision and cross sites replicates in High-Content Screening. Evaluating our model against known baseline on cross-modal retrieval, we show that our proposed approach allows to learn better representations and mitigate batch effect. In addition, we also present a preprocessing method for the JUMP-CP dataset that effectively reduce the required space from 85Tb to a mere usable 7Tb size, still retaining all perturbations and most of the information content.
随着各种模态数据可用性的激增,越来越需要弥合不同数据类型之间的差距。在这项工作中,我们提出了一种新颖的方法,可以在图像数据和分子表示之间学习跨模态表示,用于药物发现。我们提出了EMM和IMM,两种基于CLIP的创新损失函数,并利用高内容筛选中的弱监督和跨站点复制。通过评估我们的模型与已知基线的跨模态检索,我们证明了我们的方法允许学习更好的表示,并减轻了批效应。此外,我们还提出了JUMP-CP数据集的预处理方法,有效地将从85Tb的需求空间减少到仅7Tb的可用性大小,同时保留所有扰动和大部分信息内容。
https://arxiv.org/abs/2311.04678
This paper investigates the combination of intensity-based distance maps with boundary loss for point-supervised semantic segmentation. By design the boundary loss imposes a stronger penalty on the false positives the farther away from the object they occur. Hence it is intuitively inappropriate for weak supervision, where the ground truth label may be much smaller than the actual object and a certain amount of false positives (w.r.t. the weak ground truth) is actually desirable. Using intensity-aware distances instead may alleviate this drawback, allowing for a certain amount of false positives without a significant increase to the training loss. The motivation for applying the boundary loss directly under weak supervision lies in its great success for fully supervised segmentation tasks, but also in not requiring extra priors or outside information that is usually required -- in some form -- with existing weakly supervised methods in the literature. This formulation also remains potentially more attractive than existing CRF-based regularizers, due to its simplicity and computational efficiency. We perform experiments on two multi-class datasets; ACDC (heart segmentation) and POEM (whole-body abdominal organ segmentation). Preliminary results are encouraging and show that this supervision strategy has great potential. On ACDC it outperforms the CRF-loss based approach, and on POEM data it performs on par with it. The code for all our experiments is openly available.
本文研究了基于强度的距离图与边界损失在点监督语义分割中的结合。通过设计边界损失对远离目标的错误预测给予更强的惩罚。因此,在弱监督下,这并不合适,因为 ground truth 标签可能远小于实际物体,而且一定数量的错误预测(相对于弱 ground truth)实际上是有益的。使用强度感知距离来代替它可能减轻这个缺点,允许在训练损失没有显著增加的情况下实现一定数量的错误预测。在弱监督下直接应用边界损失的原因在于它在完全监督分割任务中的巨大成功,但同时也因为它不需要通常需要的一些额外先验信息或外部信息而受到欢迎——在某些形式下,这与现有的弱监督方法相矛盾。这种公式还比现有的 CRF 基 regularizer 更具有吸引力,因为它的简单性和计算效率。我们在两个多类数据集上进行实验:ACDC(心部分割)和 POMEM(全身腹部器官分割)。初步结果鼓舞人心,表明这种监督策略具有巨大的潜力。在 ACDC 上,它超过了基于 CRF-loss 的方法,而在 POMEM 数据上,它与基于 CRF-loss 的方法表现相当。我们所有的实验代码都是公开可用的。
https://arxiv.org/abs/2311.03537
Clinical neuroimaging data is naturally hierarchical. Different magnetic resonance imaging (MRI) sequences within a series, different slices covering the head, and different regions within each slice all confer different information. In this work we present a hierarchical attention network for abnormality detection using MRI scans obtained in a clinical hospital setting. The proposed network is suitable for non-volumetric data (i.e. stacks of high-resolution MRI slices), and can be trained from binary examination-level labels. We show that this hierarchical approach leads to improved classification, while providing interpretability through either coarse inter- and intra-slice abnormality localisation, or giving importance scores for different slices and sequences, making our model suitable for use as an automated triaging system in radiology departments.
临床神经影像数据自然具有层次结构。序列中不同的MRI序列、覆盖头部的不同切片以及每个切片中的不同区域都提供不同的信息。在这项工作中,我们提出了一个用于临床医院环境下获得的MRI扫描的异常检测分层网络。所提出的网络适用于非体积数据(即高分辨率MRI切片堆叠),可以从二进制检查水平标签进行训练。我们证明了这种层次方法会导致分类的提高,同时提供通过粗略间和亚切片异常局部定位以及为不同切片和序列提供重要分数的方法,使我们的模型成为放射科部门自动分诊系统的合适选择。
https://arxiv.org/abs/2311.02992