Detection of malignant lesions on mammography images is extremely important for early breast cancer diagnosis. In clinical practice, images are acquired from two different angles, and radiologists can fully utilize information from both views, simultaneously locating the same lesion. However, for automatic detection approaches such information fusion remains a challenge. In this paper, we propose a new model called MAMM-Net, which allows the processing of both mammography views simultaneously by sharing information not only on an object level, as seen in existing works, but also on a feature level. MAMM-Net's key component is the Fusion Layer, based on deformable attention and designed to increase detection precision while keeping high recall. Our experiments show superior performance on the public DDSM dataset compared to the previous state-of-the-art model, while introducing new helpful features such as lesion annotation on pixel-level and classification of lesions malignancy.
在乳腺X光片(mammography)图像中检测恶性病变对于早期乳腺癌诊断至关重要。在临床实践中,图像从两个不同的角度获取,放射科医生可以同时利用这两个角度的信息,同时定位相同的病变。然而,对于自动检测方法,例如信息融合,仍然是一个挑战。在本文中,我们提出了一个名为MAMM-Net的新模型,允许在物体级别共享关于对象的更多信息,同时在特征级别共享关于特征的信息。MAMM-Net的关键组件是融合层,基于可塑性注意力和设计,旨在提高检测精度同时保持高召回率。我们的实验结果表明,与之前的最佳模型相比,在公开的DDSM数据集上具有卓越的性能,同时引入了新的有帮助的功能,如在像素级别对病变进行注释和将病变分类为恶性。
https://arxiv.org/abs/2404.16718
Modeling non-stationary data is a challenging problem in the field of continual learning, and data distribution shifts may result in negative consequences on the performance of a machine learning model. Classic learning tools are often vulnerable to perturbations of the input covariates, and are sensitive to outliers and noise, and some tools are based on rigid algebraic assumptions. Distribution shifts are frequently occurring due to changes in raw materials for production, seasonality, a different user base, or even adversarial attacks. Therefore, there is a need for more effective distribution shift detection techniques. In this work, we propose a continual learning framework for monitoring and detecting distribution changes. We explore the problem in a latent space generated by a bio-inspired self-organizing clustering and statistical aspects of the latent space. In particular, we investigate the projections made by two topology-preserving maps: the Self-Organizing Map and the Scale Invariant Map. Our method can be applied in both a supervised and an unsupervised context. We construct the assessment of changes in the data distribution as a comparison of Gaussian signals, making the proposed method fast and robust. We compare it to other unsupervised techniques, specifically Principal Component Analysis (PCA) and Kernel-PCA. Our comparison involves conducting experiments using sequences of images (based on MNIST and injected shifts with adversarial samples), chemical sensor measurements, and the environmental variable related to ozone levels. The empirical study reveals the potential of the proposed approach.
建模非平稳数据是连续学习领域的一个具有挑战性的问题,数据分布的变化可能导致机器学习模型的性能下降。经典的 learning 工具通常对输入协变量的小扰动敏感,对异常值和噪声敏感,有些工具是基于刚性的代数假设。由于生产原材料的变化、季节性、不同的用户群或甚至恶意攻击等原因,数据分布的变化经常发生。因此,有必要开发更有效的分布变化检测技术。 在这项工作中,我们提出了一个连续学习框架,用于监测和检测分布变化。我们在由生物启发的自组织聚类生成的潜在空间中研究这个问题。特别是,我们研究了两个保持拓扑不变的映射的投影:自组织映射和收缩不变映射。我们的方法可以在有监督和无监督两种情况下应用。我们对数据分布的变化进行评估,通过比较高斯信号,使所提出的方法快速且具有鲁棒性。我们将其与其它无监督技术(特别是主成分分析(PCA)和核聚类)进行比较。我们的比较包括使用图像序列(基于 MNIST 数据集并注入对抗样本)、化学传感器测量和与臭氧水平相关的环境变量进行的实验。实证研究揭示了所提出方法的优势。
https://arxiv.org/abs/2404.16656
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at this https URL
目前最先进的两阶段模型在实例分割任务中存在多种不平衡类型。在本文中,我们解决了在第二阶段训练过程中输入区域关键点(RoIs)的交集over联合(IoU)分布不平衡。我们的自平衡R-CNN(SBR-CNN)模型,是Hybrid Task Cascade(HTC)模型的进化版本,带来了边界框和掩码精度的循环机制。通过改进的通用RoI提取(GRoIE),我们还解决了特征层不平衡问题,源于低层和高层特征之间的非均匀整合。此外,模型的架构朝着全卷积方法迈进,FCC进一步减少了参数并获得更多关于任务要解决的和层使用的提示。此外,与最先进的其他模型相结合,我们的SBR-CNN模型显示出相同或更好的性能。事实上,使用轻量级的ResNet-50作为骨架,在COCO minival 2017数据集上评估,我们的模型达到45.3%和41.5%的AP,经过12个epoch和无需额外技巧。代码可在此处访问:https://url
https://arxiv.org/abs/2404.16633
Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.
低 shot 计数器根据图像中仅标注了几个或没有示例的类别的对象数量来估计选定类别的物体数量。 目前的最佳估计将总数计数视为物体位置密度图的求和,但并未提供物体位置和大小,这对于许多应用程序至关重要。这是通过基于检测的计数器来解决的,尽管在总数准确性上落后于最先进的计数器。此外,两种方法都容易在存在其他类别的物体时高估计数,由于许多误检。我们提出了 DAVE,一种基于检测和验证范式的低 shot 计数器,通过首先生成一个高召回度的检测集,然后验证检测结果来识别和删除异常值,从而避免了上述问题。这种方法共同提高了召回率和精度,导致准确的计数。与最先进的密度基于计数器相比,DAVE 在总数计数 MAE 上约 20% 更优,而在检测质量和零 shot 计数方面与最先进的检测基于计数器相当。此外,DAVE 在零 shot 和文本提示基于计数方面达到了最先进水平。
https://arxiv.org/abs/2404.16622
The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic World dataset, far exceeding existing models. Code is available at this https URL.
在复杂场景中进行时空动作局部化的任务是高级视频理解的一个具有挑战性的任务。通过高质量的视频特征提取和增强检测器预测锚点的精度,可以有效地提高模型性能。为此,我们提出了一个高性能的双流时空特征提取网络SFMViT,采用锚点剪枝策略。我们SFMViT的骨干网络由ViT和SlowFast组成,基于先前对时空动作局部化的知识,充分利用ViT的卓越全局特征提取能力和SlowFast的时空序列建模能力。其次,我们引入了最大置信度堆来剪枝检测器在每个帧中检测到的锚点,以过滤出有效的锚点。这些设计使我们的SFMViT在Chaotic World数据集上的mAP达到26.62%,远超过现有模型。代码可以从该链接获得。
https://arxiv.org/abs/2404.16609
In this paper, we propose a novel approach to address the problem of camera and radar sensor fusion for 3D object detection in autonomous vehicle perception systems. Our approach builds on recent advances in deep learning and leverages the strengths of both sensors to improve object detection performance. Precisely, we extract 2D features from camera images using a state-of-the-art deep learning architecture and then apply a novel Cross-Domain Spatial Matching (CDSM) transformation method to convert these features into 3D space. We then fuse them with extracted radar data using a complementary fusion strategy to produce a final 3D object representation. To demonstrate the effectiveness of our approach, we evaluate it on the NuScenes dataset. We compare our approach to both single-sensor performance and current state-of-the-art fusion methods. Our results show that the proposed approach achieves superior performance over single-sensor solutions and could directly compete with other top-level fusion methods.
在本文中,我们提出了一种新的方法来解决自动驾驶感知系统中3D物体检测的问题。我们的方法基于最近在深度学习方面的进展,并利用两个传感器的优势来提高物体检测性能。具体来说,我们使用最先进的深度学习架构提取相机图像的2D特征,然后应用一种新颖的跨域空间匹配(CDSM)变换方法将它们转换为3D空间。接着,我们使用互补的融合策略将提取的雷达数据与2D特征融合,产生最终的3D物体表示。为了证明我们方法的有效性,我们在 nuScenes 数据集上进行了评估。我们将我们的方法与单传感器性能和当前最先进的融合方法进行了比较。我们的结果表明,与单传感器解决方案相比,所提出的方法具有卓越的性能,并可以直接与其他顶级融合方法竞争。
https://arxiv.org/abs/2404.16548
The prevalent approaches of unsupervised 3D object detection follow cluster-based pseudo-label generation and iterative self-training processes. However, the challenge arises due to the sparsity of LiDAR scans, which leads to pseudo-labels with erroneous size and position, resulting in subpar detection performance. To tackle this problem, this paper introduces a Commonsense Prototype-based Detector, termed CPD, for unsupervised 3D object detection. CPD first constructs Commonsense Prototype (CProto) characterized by high-quality bounding box and dense points, based on commonsense intuition. Subsequently, CPD refines the low-quality pseudo-labels by leveraging the size prior from CProto. Furthermore, CPD enhances the detection accuracy of sparsely scanned objects by the geometric knowledge from CProto. CPD outperforms state-of-the-art unsupervised 3D detectors on Waymo Open Dataset (WOD), PandaSet, and KITTI datasets by a large margin. Besides, by training CPD on WOD and testing on KITTI, CPD attains 90.85% and 81.01% 3D Average Precision on easy and moderate car classes, respectively. These achievements position CPD in close proximity to fully supervised detectors, highlighting the significance of our method. The code will be available at this https URL.
大多数无监督的三维物体检测方法遵循基于聚类的伪标签生成和迭代自训练过程。然而,由于激光雷达扫描的稀疏性,导致伪标签具有错误的大小和位置,从而导致检测性能不佳。为了解决这个问题,本文引入了一种以常识原型为基础的检测器,称为CPD,用于无监督三维物体检测。CPD首先基于常识直觉构建了高质量的边界框和密集点的高质量常识原型(CProto)。然后,CPD通过利用CProto的大小先验来优化低质量伪标签。此外,CPD通过CProto的几何知识提高了稀疏扫描对象检测的准确性。CPD在Waymo Open Dataset(WOD)、PandaSet和KITTI数据集上优于最先进的无监督三维检测器。此外,通过在WOD和KITTI上训练CPD并进行测试,CPD在容易和 moderate 车辆类别上获得了90.85%和81.01%的3D平均精度。这些成就使CPD与完全监督的检测器相接近,强调了我们的方法的重要性。代码将在该https URL上可用。
https://arxiv.org/abs/2404.16493
Adversarial patch attacks present a significant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at this https URL.
对抗性补丁攻击对现实世界的物体检测器构成了显著的安全威胁,因为它们的实际可行性。现有的防御方法,依赖攻击数据或先验知识,很难有效地应对广泛的对抗性补丁。在本文中,我们展示了对抗性补丁的两个固有特性:语义独立性和空间异质性,无论它们的形状、大小、数量和位置如何。语义独立性表明,攻击性补丁在语义上下文内自行为,而空间异质性表现为由于独立生成过程,补丁区域与原始干净图像的图像质量不同的显著图像质量差异。基于这些观察结果,我们提出了PAD,一种新颖的对抗性补丁局部化和删除方法,不需要先验知识或额外训练。PAD能够对各种对抗性补丁进行补丁,兼容任何预训练的物体检测器。我们对各种补丁类型(如局部噪音、可打印的和自然istic补丁)进行全面的数字和物理实验,结果表明,与最先进的成果相比,我们的工作取得了显著的改善。我们的代码可在此处访问:https://www.thuatminh.com/
https://arxiv.org/abs/2404.16452
Unsupervised graph anomaly detection aims at identifying rare patterns that deviate from the majority in a graph without the aid of labels, which is important for a variety of real-world applications. Recent advances have utilized Graph Neural Networks (GNNs) to learn effective node representations by aggregating information from neighborhoods. This is motivated by the hypothesis that nodes in the graph tend to exhibit consistent behaviors with their neighborhoods. However, such consistency can be disrupted by graph anomalies in multiple ways. Most existing methods directly employ GNNs to learn representations, disregarding the negative impact of graph anomalies on GNNs, resulting in sub-optimal node representations and anomaly detection performance. While a few recent approaches have redesigned GNNs for graph anomaly detection under semi-supervised label guidance, how to address the adverse effects of graph anomalies on GNNs in unsupervised scenarios and learn effective representations for anomaly detection are still under-explored. To bridge this gap, in this paper, we propose a simple yet effective framework for Guarding Graph Neural Networks for Unsupervised Graph Anomaly Detection (G3AD). Specifically, G3AD introduces two auxiliary networks along with correlation constraints to guard the GNNs from inconsistent information encoding. Furthermore, G3AD introduces an adaptive caching module to guard the GNNs from solely reconstructing the observed data that contains anomalies. Extensive experiments demonstrate that our proposed G3AD can outperform seventeen state-of-the-art methods on both synthetic and real-world datasets.
无监督图异常检测旨在识别在图中与多数不同的罕见模式,而无需标签帮助,这对于各种现实应用场景具有重要意义。最近,人们利用图神经网络(GNNs)通过聚合图中的信息来学习有效的节点表示。这一假设基于一个结论,即图中节点通常会表现出与周围节点一致的行为。然而,图异常可能会以多种方式破坏这种一致性。大多数现有方法直接使用GNNs来学习表示,忽视了图异常对GNNs的负面影响,导致节点表示效果不佳和异常检测性能下降。虽然一些最近的方法为在半监督标签指导下去优化GNNs进行了重新设计,但如何解决图异常对GNNs在无监督场景下的不利影响以及如何学习有效的异常检测表示方法仍然是一个未探索的问题。为了填补这个空白,本文提出了一种简单的但有效的框架来保护无监督图神经网络免受异常的影响,即G3AD。具体来说,G3AD引入了两个辅助网络和相关约束来保护GNNs免受不一致信息编码。此外,G3AD还引入了自适应缓存模块来保护GNNs免受仅重构包含异常的观察数据。大量实验证明,我们提出的G3AD可以在 synthetic 和 real-world 数据上优于 17 个最先进的算法的表现。
https://arxiv.org/abs/2404.16366
Model Weight Averaging (MWA) is a technique that seeks to enhance model's performance by averaging the weights of multiple trained models. This paper first empirically finds that 1) the vanilla MWA can benefit the class-imbalanced learning, and 2) performing model averaging in the early epochs of training yields a greater performance improvement than doing that in later epochs. Inspired by these two observations, in this paper we propose a novel MWA technique for class-imbalanced learning tasks named Iterative Model Weight Averaging (IMWA). Specifically, IMWA divides the entire training stage into multiple episodes. Within each episode, multiple models are concurrently trained from the same initialized model weight, and subsequently averaged into a singular model. Then, the weight of this average model serves as a fresh initialization for the ensuing episode, thus establishing an iterative learning paradigm. Compared to vanilla MWA, IMWA achieves higher performance improvements with the same computational cost. Moreover, IMWA can further enhance the performance of those methods employing EMA strategy, demonstrating that IMWA and EMA can complement each other. Extensive experiments on various class-imbalanced learning tasks, i.e., class-imbalanced image classification, semi-supervised class-imbalanced image classification and semi-supervised object detection tasks showcase the effectiveness of our IMWA.
模型加权平均(MWA)是一种通过平均多个训练模型的权重来提高模型性能的技术。本文首先通过实验实证发现,1)普通MWA对类不平衡学习有利,2)在训练早期进行模型平均比在训练后期进行模型平均效果更好。受到这两个观察结果的启发,本文提出了一种名为迭代模型权重平均(IMWA)的新MWA技术,用于类不平衡学习任务。具体来说,IMWA将整个训练阶段划分为多个 episode。在每个 episode 中,多个模型从相同的初始模型权重并行训练,然后将这些模型的权重平均成一个单一的模型。接着,这个平均模型的权重成为后续episode的初始化,从而建立了一个迭代学习范式。与普通MWA相比,IMWA在相同的计算成本下实现了更高的性能改进。此外,IMWA还可以进一步增强使用EMA策略的方法的性能,表明IMWA和EMA可以相互补充。在各种类不平衡学习任务上(即类不平衡图像分类、半监督类不平衡图像分类和半监督目标检测任务)的广泛实验表明了IMWA的有效性。
https://arxiv.org/abs/2404.16331
Robot swarms hold immense potential for performing complex tasks far beyond the capabilities of individual robots. However, the challenge in unleashing this potential is the robots' limited sensory capabilities, which hinder their ability to detect and adapt to unknown obstacles in real-time. To overcome this limitation, we introduce a novel robot swarm control method with an indirect obstacle detector using a smoothed particle hydrodynamics (SPH) model. The indirect obstacle detector can predict the collision with an obstacle and its collision point solely from the robot's velocity information. This approach enables the swarm to effectively and accurately navigate environments without the need for explicit obstacle detection, significantly enhancing their operational robustness and efficiency. Our method's superiority is quantitatively validated through a comparative analysis, showcasing its significant navigation and pattern formation improvements under obstacle-unaware conditions.
机器人群具有在远超过单个机器人的复杂任务中执行巨大潜力。然而,释放这一潜力的挑战是机器人的有限感知能力,这阻碍了它们在实时感知未知障碍的能力。为了克服这一限制,我们引入了一种使用平滑粒子流体动力学(SPH)模型的新型机器人群控制方法。这种间接障碍检测器可以通过机器人的速度信息仅预测与障碍的碰撞及其碰撞点。这种方法使群能够有效且准确地导航环境,而无需进行显式的障碍检测,显著提高了它们的操作稳健性和效率。通过对比分析验证,我们量化了该方法在无障碍条件下的导航和模式形成改善。
https://arxiv.org/abs/2404.16309
Lane detection has made significant progress in recent years, but there is not a unified architecture for its two sub-tasks: 2D lane detection and 3D lane detection. To fill this gap, we introduce BézierFormer, a unified 2D and 3D lane detection architecture based on Bézier curve lane representation. BézierFormer formulate queries as Bézier control points and incorporate a novel Bézier curve attention mechanism. This attention mechanism enables comprehensive and accurate feature extraction for slender lane curves via sampling and fusing multiple reference points on each curve. In addition, we propose a novel Chamfer IoU-based loss which is more suitable for the Bézier control points regression. The state-of-the-art performance of BézierFormer on widely-used 2D and 3D lane detection benchmarks verifies its effectiveness and suggests the worthiness of further exploration.
近年来,在车道检测方面取得了显著的进步,但二维和三维车道检测的两个子任务并没有一个统一的架构。为了填补这一空白,我们引入了BézierFormer,一种基于Bézier曲线车道表示的统一二维和三维车道检测架构。BézierFormer将查询表示为Bézier控制点,并引入了一种新颖的Bézier曲线注意力机制。这种注意力机制通过采样和融合每个曲线上的多个参考点,实现对细小车道曲线的全面而准确的特征提取。此外,我们提出了一种新的Chamfer IoU基于损失,该损失更适合Bézier控制点回归。BézierFormer在广泛使用的二维和三维车道检测基准测试中的最先进性能证实了其有效性和进一步探索的必要性。
https://arxiv.org/abs/2404.16304
Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at this https URL.
集成可见-红外光谱提示的跨模态图像可以为目标检测提供更丰富的互补信息。然而,现有的可见-红外物体检测方法在严重天气条件下性能严重下降。这种失败的根源在于可见图像对环境扰动(如雨、雾和雪)的显著敏感性,这些扰动经常导致检测中的假阴性和假阳性结果。为了应对这个问题,我们引入了一个新颖且具有挑战性的任务,称为在恶劣天气条件下的可见-红外物体检测。为了促进这项任务,我们构建了一个包含丰富恶劣天气场景的新严重天气可见-红外数据集(SWVID)。此外,我们还引入了天气去除的交叉模态融合Mamba(CFMW)来增强在恶劣天气条件下的检测准确性。由于所提出的天气去除扩散模型(WRDM)和交叉模态融合Mamba(CFM)模块,CFMW能够挖掘跨模态融合中行人特征的更多重要信息,从而具有高效率地将其应用到其他罕见场景中,并且在低计算能力的平台上具有足够的可用性。据我们所知,这是第一个将扩散和Mamba模块集成在跨模态物体检测中的研究,成功地提高了这种模型的实际应用准确性和更先进的架构。在知名和自创建数据集上进行的大量实验证明,我们的CFMW实现了最先进的检测性能,超过了现有的基准。数据集和源代码将公开发布在以下链接:
https://arxiv.org/abs/2404.16302
Fuzzing, a widely-used technique for bug detection, has seen advancements through Large Language Models (LLMs). Despite their potential, LLMs face specific challenges in fuzzing. In this paper, we identified five major challenges of LLM-assisted fuzzing. To support our findings, we revisited the most recent papers from top-tier conferences, confirming that these challenges are widespread. As a remedy, we propose some actionable recommendations to help improve applying LLM in Fuzzing and conduct preliminary evaluations on DBMS fuzzing. The results demonstrate that our recommendations effectively address the identified challenges.
模糊测试(Fuzzing)是一种广泛使用的代码审计技术,它通过大型语言模型(LLMs)取得了进展。尽管LLMs具有巨大的潜力,但它们在模糊测试方面面临一些特定的挑战。在本文中,我们确定了LLM辅助模糊测试的五个主要挑战。为了支持我们的发现,我们回顾了顶级会议中最新的论文,证实了这些挑战是普遍存在的。为了改善在模糊测试中应用LLM,我们提出了一些可行的建议,并对DBMS模糊测试进行了初步评估。结果显示,我们的建议有效地解决了识别出的挑战。
https://arxiv.org/abs/2404.16297
With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues. This paper introduces a new splicing image detection algorithm based on the statistical characteristics of natural images, aimed at improving the accuracy and efficiency of splicing image detection. By analyzing the limitations of traditional methods, we have developed a detection framework that integrates advanced statistical analysis techniques and machine learning methods. The algorithm has been validated using multiple public datasets, showing high accuracy in detecting spliced edges and locating tampered areas, as well as good robustness. Additionally, we explore the potential applications and challenges faced by the algorithm in real-world scenarios. This research not only provides an effective technological means for the field of image tampering detection but also offers new ideas and methods for future related research.
随着数字图像处理技术的发展和广泛应用,图像拼接已成为图像编辑的常见方法,同时也带来了许多安全和法律问题。本文介绍了一种基于自然图像统计特征的新型拼接图像检测算法,旨在提高拼接图像检测的准确性和效率。通过分析传统方法的局限性,我们开发了一个集成了高级统计分析技术和机器学习方法的检测框架。该算法已通过多个公共数据集的验证,显示出在检测拼接边缘和查找被篡改区域方面的较高准确性和鲁棒性。此外,我们还探讨了该算法在现实场景中的潜在应用和所面临的问题。这项研究不仅为图像编辑检测领域提供了一种有效的技术手段,而且为未来相关研究提供了新的思路和方法。
https://arxiv.org/abs/2404.16296
AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundational models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite of functionalities spanning classification, regression, object detection, semantic matching, and image segmentation. Experiments across diverse datasets and tasks showcases AutoMM's superior performance in basic classification and regression tasks compared to existing AutoML tools, while also demonstrating competitive results in advanced tasks, aligning with specialized toolboxes designed for such purposes.
AutoGluon-Multimodal(AutoMM)是一个专为多模态学习而设计的开源 AutoML 库。它以其出色的易用性而著称,使用户只需三行代码即可微调基本模型。支持各种模态,包括图像、文本和表格数据,独立或组合,库提供了全面的函数集,包括分类、回归、目标检测、语义匹配和图像分割。通过在各种数据集和任务上的实验,展示了 AutoMM 在基本分类和回归任务上优越的性能,同时在高级任务上也有竞争力的表现,与专门为此目的设计的工具箱相吻合。
https://arxiv.org/abs/2404.16233
Deepfake or synthetic images produced using deep generative models pose serious risks to online platforms. This has triggered several research efforts to accurately detect deepfake images, achieving excellent performance on publicly available deepfake datasets. In this work, we study 8 state-of-the-art detectors and argue that they are far from being ready for deployment due to two recent developments. First, the emergence of lightweight methods to customize large generative models, can enable an attacker to create many customized generators (to create deepfakes), thereby substantially increasing the threat surface. We show that existing defenses fail to generalize well to such \emph{user-customized generative models} that are publicly available today. We discuss new machine learning approaches based on content-agnostic features, and ensemble modeling to improve generalization performance against user-customized models. Second, the emergence of \textit{vision foundation models} -- machine learning models trained on broad data that can be easily adapted to several downstream tasks -- can be misused by attackers to craft adversarial deepfakes that can evade existing defenses. We propose a simple adversarial attack that leverages existing foundation models to craft adversarial samples \textit{without adding any adversarial noise}, through careful semantic manipulation of the image content. We highlight the vulnerabilities of several defenses against our attack, and explore directions leveraging advanced foundation models and adversarial training to defend against this new threat.
利用深度生成模型产生的 Deepfake 或合成图像对在线平台造成了严重威胁。这引发了多项研究努力,以准确检测 Deepfake 图像,并在公开的 Deepfake 数据集上取得优异性能。在这项工作中,我们研究了 8 项最先进的检测器,并认为它们距离部署还有很长的路要走,因为有两个最近的发展。首先,出现了一种轻量化的方法来定制大型生成模型,攻击者可以创建许多自定义生成器(创建 Deepfakes),从而大大增加威胁表面。我们证明了现有的防御措施对这种公开可用的用户自定义生成模型效果不佳。我们讨论了基于内容无关特征的新机器学习方法以及集成建模以提高对抗用户自定义模型的性能。其次,出现了一种可以被攻击者用于制作能够逃避现有防御措施的 adversarial Deepfakes 的机器学习模型,即 vision foundation models。我们提出了一个简单的 adversarial 攻击,该攻击利用现有的 foundation 模型制作 adversarial 样本,并通过仔细的语义操作来操纵图像内容。我们重点讨论了多项防御措施如何对抗我们的攻击,并探讨了利用先进的 foundation 模型和 adversarial 训练来抵御这种新威胁的方向。
https://arxiv.org/abs/2404.16212
Anomaly detection in industrial systems is crucial for preventing equipment failures, ensuring risk identification, and maintaining overall system efficiency. Traditional monitoring methods often rely on fixed thresholds and empirical rules, which may not be sensitive enough to detect subtle changes in system health and predict impending failures. To address this limitation, this paper proposes, a novel Attention-based convolutional autoencoder (ABCD) for risk detection and map the risk value derive to the maintenance planning. ABCD learns the normal behavior of conductivity from historical data of a real-world industrial cooling system and reconstructs the input data, identifying anomalies that deviate from the expected patterns. The framework also employs calibration techniques to ensure the reliability of its predictions. Evaluation results demonstrate that with the attention mechanism in ABCD a 57.4% increase in performance and a reduction of false alarms by 9.37% is seen compared to without attention. The approach can effectively detect risks, the risk priority rank mapped to maintenance, providing valuable insights for cooling system designers and service personnel. Calibration error of 0.03% indicates that the model is well-calibrated and enhances model's trustworthiness, enabling informed decisions about maintenance strategies
工业系统中的异常检测对于防止设备故障、确保风险识别和维持整体系统效率至关重要。传统的监测方法通常依赖于固定的阈值和经验规则,这些阈值可能不足以检测系统健康状况的细微变化并预测即将发生的故障。为了克服这一局限,本文提出了一种新颖的自注意力卷积自动编码器(ABCD)用于风险检测,并将其风险价值映射到维护计划。ABCD从现实工业冷却系统的 historical 数据中学习电导率的正常行为,并重构输入数据,识别出与预期模式不符的异常。框架还采用校准技术来确保其预测的可靠性。评估结果表明,与没有注意机制的 ABCD 相比,性能提高了 57.4%,虚假警报减少了 9.37%。这种方法可以有效地检测风险,将风险优先级映射到维护,为冷却系统设计和服务人员提供了宝贵的见解。校准误差为 0.03% 说明模型已经很好地校准,提高了模型的可信度,使得在维护策略方面做出知情的决策。
https://arxiv.org/abs/2404.16183
Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.
目的:本研究探讨了使用生成机器学习(ML)将光学相干断层扫描(OCT)图像转换为光学相干断层扫描血管造影(OCTA)图像的可行性,可能绕过需要专用OCTA硬件的需求。方法:本研究采用了一个包括2D血管分割模型和2D OCTA图像翻译模型的生成对抗网络框架。研究利用了一个包含500名患者的公共数据集,根据分辨率和疾病状态将数据集划分为子集,以验证TR-OCTA图像的质量。验证采用多个质量和数量指标来比较转换后的图像与真实OCTA(GT-OCTA)之间的差异。然后,通过定量分析TR-OCTA中产生的血管特征与GT-OCTA之间的差异,评估了使用TR-OCTA进行客观疾病诊断的可行性。结果:在3毫米和6毫米数据集上,TR-OCTA显示出与GT-OCTA相同的高图像质量(高分辨率、中等结构相似度和对比质量与GT-OCTA相比)。在病患者中,血管指标略有差异。像曲折度和血管周长指数这样的血管特征显示出更好的趋势,而受到局部血管变形影响的密度特征呈下降趋势。结论:本研究通过利用TR-OCTA中的血管特征来解决OCTA在临床实践中的局限性,为眼科疾病的诊断提供了一个有前景的解决方案。翻译意义:通过使详细血管成像更加广泛可用,并减少对昂贵OCTA设备的依赖,本研究有可能显著提高眼科疾病的诊断过程。
https://arxiv.org/abs/2404.16133
We introduce the RetinaRegNet model, which can achieve state-of-the-art performance across various retinal image registration tasks. RetinaRegNet does not require training on any retinal images. It begins by establishing point correspondences between two retinal images using image features derived from diffusion models. This process involves the selection of feature points from the moving image using the SIFT algorithm alongside random point sampling. For each selected feature point, a 2D correlation map is computed by assessing the similarity between the feature vector at that point and the feature vectors of all pixels in the fixed image. The pixel with the highest similarity score in the correlation map corresponds to the feature point in the moving image. To remove outliers in the estimated point correspondences, we first applied an inverse consistency constraint, followed by a transformation-based outlier detector. This method proved to outperform the widely used random sample consensus (RANSAC) outlier detector by a significant margin. To handle large deformations, we utilized a two-stage image registration framework. A homography transformation was used in the first stage and a more accurate third-order polynomial transformation was used in the second stage. The model's effectiveness was demonstrated across three retinal image datasets: color fundus images, fluorescein angiography images, and laser speckle flowgraphy images. RetinaRegNet outperformed current state-of-the-art methods in all three datasets. It was especially effective for registering image pairs with large displacement and scaling deformations. This innovation holds promise for various applications in retinal image analysis. Our code is publicly available at this https URL.
我们提出了RetinaRegNet模型,可以实现各种视网膜图像配准任务的当前最佳性能。RetinaRegNet不需要对任何视网膜图像进行训练。它首先通过扩散模型从图像特征中建立两个视网膜图像之间的点对应关系。这个过程涉及使用SIFT算法从运动图像中选择特征点,并进行随机点采样。对于选定的每个特征点,计算该点与固定图像中所有像素的特征向量之间的相似度张量。在张量中,相似度得分最高的像素对应于运动图像中的特征点。为了消除估计点对应关系的估计方差,我们首先应用了反一致性约束,然后应用了基于变换的异常检测。这种方法在广泛使用的随机样本一致性(RANSAC)异常检测器方面显著优于该方法。为了处理大的变形,我们使用了两阶段图像配准框架。在第一阶段,使用单应性变换;在第二阶段,使用更精确的第三阶多项式变换。该模型的有效性在三个视网膜图像数据集上得到了证实:彩色 fundus 图像,荧光素造影图像和激光散射流量图。RetinaRegNet在所有三个数据集上都超越了当前的最佳方法。尤其是在对大位移和缩放变形进行图像对齐时,该创新具有很大的潜力,为视网膜图像分析各种应用。我们的代码现在公开可用,在这个链接上:https://www.xxx。
https://arxiv.org/abs/2404.16017