As cyber threats and malware attacks increasingly alarm both individuals and businesses, the urgency for proactive malware countermeasures intensifies. This has driven a rising interest in automated machine learning solutions. Transformers, a cutting-edge category of attention-based deep learning methods, have demonstrated remarkable success. In this paper, we present BERTroid, an innovative malware detection model built on the BERT architecture. Overall, BERTroid emerged as a promising solution for combating Android malware. Its ability to outperform state-of-the-art solutions demonstrates its potential as a proactive defense mechanism against malicious software attacks. Additionally, we evaluate BERTroid on multiple datasets to assess its performance across diverse scenarios. In the dynamic landscape of cybersecurity, our approach has demonstrated promising resilience against the rapid evolution of malware on Android systems. While the machine learning model captures broad patterns, we emphasize the role of manual validation for deeper comprehension and insight into these behaviors. This human intervention is critical for discerning intricate and context-specific behaviors, thereby validating and reinforcing the model's findings.
随着网络威胁和恶意软件攻击越来越多地警告个人和企业,对主动恶意软件应对措施的紧迫性加大了。这导致了对自动机器学习解决方案的浓厚兴趣。Transformer,一种关注式的深度学习方法,取得了显著的成功。在本文中,我们介绍了BERTroid,一种基于BERT架构的创新型恶意软件检测模型。总的来说,BERTroid成为对抗Android恶意软件的有前景的解决方案。其超越最先进的解决方案的能力表明了它在主动防御恶意软件攻击方面的潜力。此外,我们在多个数据集上评估BERTroid,以评估其在不同场景下的性能。在网络安全动态的背景下,我们的方法展示了对抗Android系统恶意软件快速演变的有前景的弹性。尽管机器学习模型捕捉到了广泛模式,但我们强调手动验证的重要性,以获得更深入和全面的了解这些行为。这种人类干预对于区分复杂和上下文特异性行为至关重要,从而验证和加强 model 的研究结果。
https://arxiv.org/abs/2405.03620
Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.
声场分类(ASC)在现实生活中具有非常重要的意义。然而,基于深度学习的ASC方法目前还不够轻量级,并且其性能也不能令人满意。为了解决这些问题,我们提出了一个深度可分离分蒸馏网络。首先,网络在时域对离散余弦图进行高-低频分解,从而大大降低计算复杂度,同时保持模型性能。其次,我们专门设计三种轻量级的ASC操作符,包括分离卷积(SC)、正交分离卷积(OSC)和分离部分卷积(SPC)。这些操作符在声场分类任务中具有高度高效的特征提取能力。实验结果表明,与目前流行的深度学习方法相比,所提出的方法实现了9.8%的性能提升,同时具有更小的参数数量和计算复杂度。
https://arxiv.org/abs/2405.03567
The rapid advancement in artificial intelligence (AI), particularly through deep neural networks, has catalyzed significant progress in fields such as vision and text processing. Nonetheless, the pursuit of AI systems that exhibit human-like reasoning and interpretability continues to pose a substantial challenge. The Neural-Symbolic paradigm, which integrates the deep learning prowess of neural networks with the reasoning capabilities of symbolic systems, presents a promising pathway toward developing more transparent and comprehensible AI systems. Within this paradigm, the Knowledge Graph (KG) emerges as a crucial element, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, predominantly utilizing the triple (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, elucidating how KG underpins this integration across three key categories: enhancing the reasoning and interpretability of neural networks through the incorporation of symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes directions for future research in the domain of Neural-Symbolic AI.
人工智能(AI)的快速发展,特别是通过深度神经网络,在视觉和文本处理等领域取得了显著的进步。然而,追求具有类人推理和可解释性的AI系统仍然是一个巨大的挑战。神经符号范式将神经网络的深度学习能力与符号系统的推理能力相结合,为开发更透明和可解释的AI系统提供了有益的途径。在这种范式中,知识图(KG)成为了一个关键要素,它通过连接实体和关系提供了一个结构化和动态的方法来表示知识,主要利用三元组(主体,谓词,对象)。本文探讨了基于KG的神经符号整合最近的研究进展,解释了KG如何通过引入符号知识(Symbol for Neural)来提高神经网络的推理和可解释性,通过神经网络方法论(Neural for Symbol)来优化符号系统的完整性准确性,并通过混合神经-符号整合来促进它们的联合应用。它强调了当前领域内的趋势,并提出了未来在神经符号AI领域的研究方向。
https://arxiv.org/abs/2405.03524
Accurate classification of medical images is essential for modern diagnostics. Deep learning advancements led clinicians to increasingly use sophisticated models to make faster and more accurate decisions, sometimes replacing human judgment. However, model development is costly and repetitive. Neural Architecture Search (NAS) provides solutions by automating the design of deep learning architectures. This paper presents ZO-DARTS+, a differentiable NAS algorithm that improves search efficiency through a novel method of generating sparse probabilities by bi-level optimization. Experiments on five public medical datasets show that ZO-DARTS+ matches the accuracy of state-of-the-art solutions while reducing search times by up to three times.
准确地分类医学图像对于现代诊断至关重要。深度学习的发展使得临床医生越来越依赖复杂的模型来做出更快、更准确的决策,有时甚至取代了人类的判断。然而,模型开发成本高且重复。神经架构搜索(NAS)通过自动设计深度学习架构提供了解决方案。本文介绍了ZO-DARTS+,一种可差分神经架构搜索算法,通过一种新的方法通过双层优化生成稀疏概率来提高搜索效率。在五个公开医疗数据集上的实验表明,ZO-DARTS+与最先进的解决方案在准确性上相匹敌,同时将搜索时间减少了30%以上。
https://arxiv.org/abs/2405.03462
This paper explores how deep learning techniques can improve visual-based SLAM performance in challenging environments. By combining deep feature extraction and deep matching methods, we introduce a versatile hybrid visual SLAM system designed to enhance adaptability in challenging scenarios, such as low-light conditions, dynamic lighting, weak-texture areas, and severe jitter. Our system supports multiple modes, including monocular, stereo, monocular-inertial, and stereo-inertial configurations. We also perform analysis how to combine visual SLAM with deep learning methods to enlighten other researches. Through extensive experiments on both public datasets and self-sampled data, we demonstrate the superiority of the SL-SLAM system over traditional approaches. The experimental results show that SL-SLAM outperforms state-of-the-art SLAM algorithms in terms of localization accuracy and tracking robustness. For the benefit of community, we make public the source code at this https URL.
本文探讨了深度学习技术如何通过结合深度特征提取和深度匹配方法来提高基于视觉的SLAM在具有挑战性的环境中的性能。通过结合深度特征提取和深度匹配方法,我们引入了一种多功能的混合视觉SLAM系统,旨在增强在具有挑战性的场景中的适应性,例如低光条件、动态照明、弱纹理区和严重抖动。我们的系统支持多种模式,包括单目、双目、单目-惯性和平面-惯性配置。我们还进行了分析,探讨了如何将视觉SLAM与深度学习方法相结合以启发其他研究者。通过在公开数据集和自采样数据上进行广泛的实验,我们证明了SL-SLAM系统与传统方法相比具有优越性。实验结果表明,SL-SLAM在定位精度和跟踪鲁棒性方面优于最先进的SLAM算法。为了造福社区,我们将SL-SLAM的源代码公开在以下链接处:
https://arxiv.org/abs/2405.03413
With the growing amount of astronomical data, there is an increasing need for automated data processing pipelines, which can extract scientific information from observation data without human interventions. A critical aspect of these pipelines is the image quality evaluation and masking algorithm, which evaluates image qualities based on various factors such as cloud coverage, sky brightness, scattering light from the optical system, point spread function size and shape, and read-out noise. Occasionally, the algorithm requires masking of areas severely affected by noise. However, the algorithm often necessitates significant human interventions, reducing data processing efficiency. In this study, we present a deep learning based image quality evaluation algorithm that uses an autoencoder to learn features of high quality astronomical images. The trained autoencoder enables automatic evaluation of image quality and masking of noise affected areas. We have evaluated the performance of our algorithm using two test cases: images with point spread functions of varying full width half magnitude, and images with complex backgrounds. In the first scenario, our algorithm could effectively identify variations of the point spread functions, which can provide valuable reference information for photometry. In the second scenario, our method could successfully mask regions affected by complex regions, which could significantly increase the photometry accuracy. Our algorithm can be employed to automatically evaluate image quality obtained by different sky surveying projects, further increasing the speed and robustness of data processing pipelines.
随着天文观测数据的不断增加,自动化数据处理管道的需求也在增加。这些管道可以提取观测数据中的科学信息,而无需人工干预。这些管道的关键方面是图像质量评估和掩码算法,该算法根据各种因素(如云覆盖,天空亮度,光学系统散射光点,点扩散函数尺寸和形状,以及读出噪声)评估图像质量。偶尔,算法需要对受到噪声严重影响的区域进行掩码。然而,该算法通常需要进行重大的人为干预,降低数据处理效率。 在本文中,我们提出了一个基于深度学习的图像质量评估算法,该算法使用自动编码器来学习高质量天文图像的特征。训练好的自动编码器可以自动评估图像质量和掩码噪点影响区域。我们用两个测试 case来评估我们算法的性能:具有不同半幅宽点扩散函数的图像和具有复杂背景的图像。 在第一种情景中,我们的算法可以有效地识别出点扩散函数的差异,这可以为 photometry 提供有价值的参考信息。在第二种情景中,我们的方法可以成功地掩码受复杂区域影响的区域,这可以显著提高 photometry 精度。我们的算法可以应用于自动评估不同天体测量项目获得的图像质量,进一步增加数据处理管道的速度和稳健性。
https://arxiv.org/abs/2405.03408
In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's perception of brightness and color, decompose images into illumination and reflection components but struggle with noise management and detail preservation under low light conditions. Retinexformer enhances illumination estimation through traditional self-attention mechanisms, but faces challenges with insufficient interpretability and suboptimal enhancement effects. To overcome these limitations, this paper introduces the RetinexMamba architecture. RetinexMamba not only captures the physical intuitiveness of traditional Retinex methods but also integrates the deep learning framework of Retinexformer, leveraging the computational efficiency of State Space Models (SSMs) to enhance processing speed. This architecture features innovative illumination estimators and damage restorer mechanisms that maintain image quality during enhancement. Moreover, RetinexMamba replaces the IG-MSA (Illumination-Guided Multi-Head Attention) in Retinexformer with a Fused-Attention mechanism, improving the model's interpretability. Experimental evaluations on the LOL dataset show that RetinexMamba outperforms existing deep learning approaches based on Retinex theory in both quantitative and qualitative metrics, confirming its effectiveness and superiority in enhancing low-light images.
在低光图像增强领域,传统Retinex方法和先进深度学习技术如Retinexformer都表现出明显的优势和局限性。传统Retinex方法旨在模仿人眼的感知亮度和颜色,通过将图像分解为光照和反射组件,但其在低光条件下处理噪声和细节保留方面遇到困难。Retinexformer通过传统的自注意力机制增强光照估计,但面临可解释性不足和提升效果不优的问题。为了克服这些局限,本文引入了RetinexMamba架构。RetinexMamba不仅保留了传统Retinex方法的物理直觉,还整合了Retinexformer的深度学习框架,通过State Space Models(SSMs)的计算效率来提高处理速度。该架构具有创新的光照估计器和损伤恢复机制,可以在增强过程中保持图像质量。此外,RetinexMamba用融合注意机制取代了Retinexformer中的IG-MSA(光照引导多头注意力),提高了模型的可解释性。在LOL数据集上的实验评估显示,RetinexMamba在Retinex理论的基础上优于现有的深度学习方法,证实了其在增强低光图像方面的有效性和优越性。
https://arxiv.org/abs/2405.03349
This study accelerates MR cholangiopancreatography (MRCP) acquisitions using deep learning-based (DL) reconstruction at 3T and 0.55T. Thirty healthy volunteers underwent conventional two-fold MRCP scans at field strengths of 3T or 0.55T. We trained a variational network (VN) using retrospectively six-fold undersampled data obtained at 3T. We then evaluated our method against standard techniques such as parallel imaging (PI) and compressed sensing (CS), focusing on peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) as metrics. Furthermore, considering acquiring fully-sampled MRCP is impractical, we added a self-supervised DL reconstruction (SSDU) to the evaluating group. We also tested our method in a prospective accelerated scenario to reflect real-world clinical applications and evaluated its adaptability to MRCP at 0.55T. Our method demonstrated a remarkable reduction of average acquisition time from 599/542 to 255/180 seconds for MRCP at 3T/0.55T. In both retrospective and prospective undersampling scenarios, the PSNR and SSIM of VN were higher than those of PI, CS, and SSDU. At the same time, VN preserved the image quality of undersampled data, i.e., sharpness and the visibility of hepatobiliary ducts. In addition, VN also produced high quality reconstructions at 0.55T resulting in the highest PSNR and SSIM. In summary, VN trained for highly accelerated MRCP allows to reduce the acquisition time by a factor of 2.4/3.0 at 3T/0.55T while maintaining the image quality of the conventional acquisition.
这项研究使用基于深度学习的(DL)复原加速了3T和0.55T的MRCP获取。在对3T或0.55T的场地强度下,30名健康志愿者接受了传统的两倍MRCP扫描。我们使用在3T上获得的反向采样数据训练了一个变分网络(VN)。然后,我们评估我们的方法与标准技术(如并行成像和压缩感知)的区别,重点关注峰值信号-噪声比(PSNR)和结构相似性(SSIM)。此外,考虑到完全采样MRCP是不切实际的,我们在评估组中添加了自监督的DL复原(SSDU)。我们还将在3T/0.55T的前瞻性加速情景中测试我们的方法,以反映临床应用的真实情况,并评估其在0.55T下MRCP的适应性。我们的方法在3T/0.55T下的平均采集时间从599/542秒降低到255/180秒。在反向和前瞻性欠采样情景中,VN的PSNR和SSIM均高于PI、CS和SSDU。同时,VN在0.55T下保持了欠采样数据的图像质量,即清晰度和肝胆管的可见性。此外,VN还在0.55T下产生了高质量的重构,导致PSNR和SSIM最高。总之,为高度加速的MRCP训练VN可以在3T/0.55T下将采集时间降低2.4/3.0倍,同时保持传统采集的图像质量。
https://arxiv.org/abs/2405.03732
Point cloud registration aligns 3D point clouds using spatial transformations. It is an important task in computer vision, with applications in areas such as augmented reality (AR) and medical imaging. This work explores the intersection of two research trends: the integration of AR into image-guided surgery and the use of deep learning for point cloud registration. The main objective is to evaluate the feasibility of applying deep learning-based point cloud registration methods for image-to-patient registration in augmented reality-guided surgery. We created a dataset of point clouds from medical imaging and corresponding point clouds captured with a popular AR device, the HoloLens 2. We evaluate three well-established deep learning models in registering these data pairs. While we find that some deep learning methods show promise, we show that a conventional registration pipeline still outperforms them on our challenging dataset.
点云配准利用空间变换对3D点云进行对齐。在计算机视觉领域,该任务非常重要,应用于增强现实(AR)和医学成像等领域。本文探讨了两个研究趋势的交集:将AR集成到图像引导手术和利用深度学习进行点云配准。主要目标是对AR引导手术中应用基于深度学习的点云配准方法的可行性进行评估。我们创建了一个医疗影像的点云数据集以及与HoloLens 2流行AR设备捕获的相应点云数据对。我们评估了三种经过充分训练的深度学习模型在这对数据对上的配准效果。虽然我们发现一些深度学习方法显示出潜力,但我们发现传统的配准流程在我们具有挑战性的数据集上仍然表现出更好的性能。
https://arxiv.org/abs/2405.03314
Ensuring driver readiness poses challenges, yet driver monitoring systems can assist in determining the driver's state. By observing visual cues, such systems recognize various behaviors and associate them with specific conditions. For instance, yawning or eye blinking can indicate driver drowsiness. Consequently, an abundance of distributed data is generated for driver monitoring. Employing machine learning techniques, such as driver drowsiness detection, presents a potential solution. However, transmitting the data to a central machine for model training is impractical due to the large data size and privacy concerns. Conversely, training on a single vehicle would limit the available data and likely result in inferior performance. To address these issues, we propose a federated learning framework for drowsiness detection within a vehicular network, leveraging the YawDD dataset. Our approach achieves an accuracy of 99.2%, demonstrating its promise and comparability to conventional deep learning techniques. Lastly, we show how our model scales using various number of federated clients
确保驾驶员准备就绪存在挑战,然而驾驶员监控系统可以帮助确定驾驶员的状态。通过观察视觉信号,这些系统识别出各种行为并将其与特定条件相关联。例如,打哈欠或眼睛眨动可能表明驾驶员疲劳。因此,为驾驶员监控生成大量分布式数据。采用机器学习技术,如驾驶员疲劳检测,提出了一个潜在解决方案。然而,将数据传输到集中的机器进行模型训练是不切实际的,因为数据量较大,存在隐私问题。相反,在单个车辆上训练会导致可用数据有限,很可能导致性能较差。为解决这些问题,我们提出了一个基于车辆网络的联邦学习框架来进行驾驶员疲劳检测,利用YawDD数据集。我们的方法实现了99.2%的准确率,证明了其潜力和与传统深度学习技术的可比性。最后,我们展示了我们的模型如何随着各种联邦客户端数量的不同而扩展。
https://arxiv.org/abs/2405.03311
Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease progression, slowing it down, or even preventing its onset. However, these methods fail to early predict an individual's probability of COPD in the future based on subtle features in the spirogram. To address this gap, for the first time, we propose DeepSpiro, a method based on deep learning for early prediction of future COPD risk. DeepSpiro consists of four parts. First, we construct Volume-Flow curves guided by Time-Volume instability smoothing (SpiroSmoother) to enhance the stability of the original Volume-Flow curves precisely. Second, we extract critical features from the evolution of varied-length key patches (SpiroEncoder) to capture the key temporal evolution from original high-dimensional dynamic sequences to a unified low-dimensional temporal representation. Third, we explain the model based on temporal attention and heterogeneous feature fusion (SpiroExplainer), which integrates information from heterogeneous data such as spirogram and demographic information. Fourth, we predict the risk of COPD based on the evolution of key patch concavity (SpiroPredictor), enabling accurate prediction of the risk of disease in high-risk patients who are not yet diagnosed, for up to 1, 2, 3, 4, 5 years, and beyond. We conduct experiments on the UK Biobank dataset. Results show that DeepSpiro achieves an AUC value of 0.8328 in the task of detecting COPD. In early prediction tasks, high-risk and low-risk groups show significant differences in the future, with a p-value of <0.001.
慢性阻塞性肺疾病(COPD)是一种慢性炎症性肺病,导致气流受阻。现有方法仅能根据心电图(在这里,心电图特别涉及测量体积流量曲线时间序列)检测到已经患有COPD的患者。因此,早期预测COPD风险对于监测COPD疾病进展、减缓其发展,甚至预防其发生具有重要意义。然而,这些方法无法根据心电图的细微特征预测患者未来的COPD风险。为了填补这一空白,首次,我们提出了DeepSpiro方法,一种基于深度学习的方法,用于预测未来COPD风险。DeepSpiro包括四个部分。第一部分,我们通过Time-Volume稳定性平滑(SpiroSmoother)构建体积流量曲线,以增强原始体积流量曲线的稳定性。第二部分,我们从变化长度的键补丁(SpiroEncoder)的演变中提取关键特征,以捕捉从原始高维动态序列到统一低维时间表示的key时间演化。第三部分,我们基于时间注意力和异质特征融合(SpiroExplainer)解释模型,该模型整合了异质数据(如心电图和人口统计学信息)的信息。第四部分,我们根据键补丁凹陷的演化预测COPD风险,这使得我们能够准确预测那些尚未被诊断的高危患者(如1、2、3、4、5年)的疾病风险,并延长至更长时间。我们在英国生物银行数据集上进行实验。结果表明,DeepSpiro在检测COPD任务上AUC值为0.8328。在早期预测任务中,高风险和低风险组之间在未来的发展上有显著差异,其p值小于0.001。
https://arxiv.org/abs/2405.03239
Brain disorders are a major challenge to global health, causing millions of deaths each year. Accurate diagnosis of these diseases relies heavily on advanced medical imaging techniques such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, the scarcity of annotated data poses a significant challenge in deploying machine learning models for medical diagnosis. To address this limitation, deep learning techniques have shown considerable promise. Domain adaptation techniques enhance a model's ability to generalize across imaging modalities by transferring knowledge from one domain (e.g., CT images) to another (e.g., MRI images). Such cross-modality adaptation is essential to improve the ability of models to consistently generalize across different imaging modalities. This study collected relevant resources from the Kaggle website and employed the Maximum Mean Difference (MMD) method - a popular domain adaptation method - to reduce the differences between imaging domains. By combining MMD with Convolutional Neural Networks (CNNs), the accuracy and utility of the model is obviously enhanced. The excellent experimental results highlight the great potential of data-driven domain adaptation techniques to improve diagnostic accuracy and efficiency, especially in resource-limited environments. By bridging the gap between different imaging modalities, the study aims to provide clinicians with more reliable diagnostic tools.
脑部疾病对全球健康构成了重大挑战,每年导致数百万人的死亡。准确诊断这些疾病依赖于先进的医学成像技术,如磁共振成像(MRI)和计算机断层扫描(CT)。然而,缺乏标注数据使得将机器学习模型应用于医学诊断方面存在重大挑战。为解决这个问题,深度学习技术已经显示出了巨大的潜力。领域自适应技术通过将一个领域的知识传递到另一个领域(例如,CT图像)来增强模型在成像模式上的泛化能力。这种跨模式适应对提高模型在不同成像模式上的一致泛化能力至关重要。 本研究从Kaggle网站收集了相关资源,并采用了一种流行的领域自适应方法——最大均差(MMD)法来降低成像领域的差异。通过将MMD与卷积神经网络(CNN)相结合,模型的准确性和实用性明显增强。出色的实验结果强调了数据驱动的领域自适应技术在提高诊断准确性和效率方面具有巨大的潜力,特别是在资源有限的环境中。通过缩小不同成像模式之间的差距,该研究旨在为临床医生提供更加可靠的诊断工具。
https://arxiv.org/abs/2405.03235
Classifying a pedestrian in one of the three conveyor states of "elevator," "escalator" and "neither" is fundamental to many applications such as indoor localization and people flow analysis. We estimate, for the first time, the pedestrian conveyor state given the inertial navigation system (INS) readings of accelerometer, gyroscope and magnetometer sampled from the phone. Our problem is challenging because the INS signals of the conveyor state are coupled and perturbed by unpredictable arbitrary human actions, confusing the decision process. We propose ELESON, a novel, effective and lightweight INS-based deep learning approach to classify whether a pedestrian is in an elevator, escalator or neither. ELESON utilizes a motion feature extractor to decouple the conveyor state from human action in the feature space, and a magnetic feature extractor to account for the speed difference between elevator and escalator. Given the results of the extractors, it employs an evidential state classifier to estimate the confidence of the pedestrian states. Based on extensive experiments conducted on twenty hours of real pedestrian data, we demonstrate that ELESON outperforms significantly the state-of-the-art approaches (where combined INS signals of both the conveyor state and human actions are processed together), with 15% classification improvement in F1 score, stronger confidence discriminability with 10% increase in AUROC (Area Under the Receiver Operating Characteristics), and low computational and memory requirements on smartphones.
分类行人所处的三种传送状态(电梯、楼梯或者既不是)在许多应用中(如室内定位和人流分析)是至关重要的。我们估计,基于手机的惯性导航系统(INS)读数,对传送状态的行人进行分类是第一次。我们的问题具有挑战性,因为传送状态的INS信号受到不可预测的任意人类行为的耦合和扰动,导致决策过程变得混乱。为了解决这一问题,我们提出了ELESON,一种新颖、有效且轻量级的基于INS的深度学习方法,来判断行人是否在电梯、楼梯中或者既不是。ELESON利用运动特征提取器将传送状态从人类动作中解耦,并使用磁特征提取器来考虑电梯和楼梯之间的速度差。根据提取器的成果,它采用证据状态分类器估计行人的状态。在处理20小时真实行人数据的大型实验基础上,我们证明了ELESON在性能上明显优于最先进的解决方案(其中传送状态和人类动作的联合INS信号一起处理),F1得分提高了15%,AUROC(接收者操作特征)的区分能力提高了10%,同时手机上的计算和内存需求较低。
https://arxiv.org/abs/2405.03218
Deep learning has emerged as a promising approach for learning the nonlinear mapping between diffusion-weighted MR images and tissue parameters, which enables automatic and deep understanding of the brain microstructures. However, the efficiency and accuracy in the multi-parametric estimations are still limited since previous studies tend to estimate multi-parametric maps with dense sampling and isolated signal modeling. This paper proposes DeepMpMRI, a unified framework for fast and high-fidelity multi-parametric estimation from various diffusion models using sparsely sampled q-space data. DeepMpMRI is equipped with a newly designed tensor-decomposition-based regularizer to effectively capture fine details by exploiting the correlation across parameters. In addition, we introduce a Nesterov-based adaptive learning algorithm that optimizes the regularization parameter dynamically to enhance the performance. DeepMpMRI is an extendable framework capable of incorporating flexible network architecture. Experimental results demonstrate the superiority of our approach over 5 state-of-the-art methods in simultaneously estimating multi-parametric maps for various diffusion models with fine-grained details both quantitatively and qualitatively, achieving 4.5 - 22.5$\times$ acceleration compared to the dense sampling of a total of 270 diffusion gradients.
深度学习已经成为了学习扩散加权磁共振图像(DWI)和组织参数之间非线性映射的有前途的方法,这使得我们能够自动和深入理解大脑微观结构。然而,多参数估计的效率和准确性仍然有限,因为以前的研究倾向于使用稀疏采样和离散信号建模来估计多参数映射。本文提出DeepMpMRI,一种基于稀疏采样q空间数据的统一框架,用于从各种扩散模型进行高速和高保真的多参数估计。DeepMpMRI配备了一个新设计的张量分解基于正则化的特征,通过利用参数之间的相关性有效地捕捉细节。此外,我们引入了一种Nesterov基于自适应学习算法,动态优化正则化参数以提高性能。DeepMpMRI是一个可扩展的框架,能够容纳灵活的网络架构。实验结果表明,我们的方法在同时估计多种扩散模型的细粒度多参数映射方面具有优越性,超过5种最先进的无监督学习方法,实现了4.5 - 22.5×的加速,相比总共270个扩散梯度的密集采样。
https://arxiv.org/abs/2405.03159
This project investigates the human multi-modal behavior identification algorithm utilizing deep neural networks. According to the characteristics of different modal information, different deep neural networks are used to adapt to different modal video information. Through the integration of various deep neural networks, the algorithm successfully identifies behaviors across multiple modalities. In this project, multiple cameras developed by Microsoft Kinect were used to collect corresponding bone point data based on acquiring conventional images. In this way, the motion features in the image can be extracted. Ultimately, the behavioral characteristics discerned through both approaches are synthesized to facilitate the precise identification and categorization of behaviors. The performance of the suggested algorithm was evaluated using the MSR3D data set. The findings from these experiments indicate that the accuracy in recognizing behaviors remains consistently high, suggesting that the algorithm is reliable in various scenarios. Additionally, the tests demonstrate that the algorithm substantially enhances the accuracy of detecting pedestrian behaviors in video footage.
本项目研究了利用深度神经网络对人类多模态行为识别的算法。根据不同模态信息的特征,不同的深度神经网络被用于适应不同模态视频信息。通过整合各种深度神经网络,该算法成功跨越多个模态识别出行为。在这个项目中,利用微软Kinect开发的多个摄像头收集了相应的骨点数据,以便获取传统图像。这种方式,可以提取图像中的运动特征。最终,通过这两种方法的合成,得出的行为特征用于准确识别和分类行为。对所提算法的性能使用了MSR3D数据集进行评估。这些实验的结果表明,通过这两种方法识别行为的准确度始终保持不变,表明该算法在不同场景下非常可靠。此外,测试还表明,该算法在视频录像中的行人行为检测方面显著增强了准确度。
https://arxiv.org/abs/2405.03091
Malware detection is a constant challenge in cybersecurity due to the rapid development of new attack techniques. Traditional signature-based approaches struggle to keep pace with the sheer volume of malware samples. Machine learning offers a promising solution, but faces issues of generalization to unseen samples and a lack of explanation for the instances identified as malware. However, human-understandable explanations are especially important in security-critical fields, where understanding model decisions is crucial for trust and legal compliance. While deep learning models excel at malware detection, their black-box nature hinders explainability. Conversely, interpretable models often fall short in performance. To bridge this gap in this application domain, we propose the use of Logic Explained Networks (LENs), which are a recently proposed class of interpretable neural networks providing explanations in the form of First-Order Logic (FOL) rules. This paper extends the application of LENs to the complex domain of malware detection, specifically using the large-scale EMBER dataset. In the experimental results we show that LENs achieve robustness that exceeds traditional interpretable methods and that are rivaling black-box models. Moreover, we introduce a tailored version of LENs that is shown to generate logic explanations with higher fidelity with respect to the model's predictions.
恶意检测是网络安全领域的一个持续挑战,因为新型攻击技术的快速发展。传统的基于签名的方法很难跟上恶意软件样品的数量。机器学习提供了一个有前景的解决方案,但面临着对未见过的样本的泛化问题和确认恶意实例的缺乏解释。然而,在安全关键领域,人类可解释的解释尤为重要,理解模型决策对于信任和法律合规至关重要。虽然深度学习模型在恶意检测方面表现出色,但它们的黑盒性质阻碍了可解释性。相反,可解释的模型通常在性能上不足。为了在应用领域中弥合这一差距,我们提出了使用可解释神经网络(LENs)的方法,这是一种最近提出的类可解释神经网络,以First-Order Logic(FOL)规则的形式提供解释。本文将LENs的应用扩展到了恶意检测的复杂领域,特别是使用大型EMBER数据集。在实验结果中,我们证明了LENs实现了比传统可解释方法更稳健的性能,并与黑盒模型相媲美。此外,我们引入了一种专用的LEN版本,该版本在模型预测的逻辑解释方面具有更高的精确性。
https://arxiv.org/abs/2405.03009
Previous research on scanpath prediction has mainly focused on group models, disregarding the fact that the scanpaths and attentional behaviors of individuals are diverse. The disregard of these differences is especially detrimental to social human-robot interaction, whereby robots commonly emulate human gaze based on heuristics or predefined patterns. However, human gaze patterns are heterogeneous and varying behaviors can significantly affect the outcomes of such human-robot interactions. To fill this gap, we developed a deep learning-based social cue integration model for saliency prediction to instead predict scanpaths in videos. Our model learned scanpaths by recursively integrating fixation history and social cues through a gating mechanism and sequential attention. We evaluated our approach on gaze datasets of dynamic social scenes, observed under the free-viewing condition. The introduction of fixation history into our models makes it possible to train a single unified model rather than the resource-intensive approach of training individual models for each set of scanpaths. We observed that the late neural integration approach surpasses early fusion when training models on a large dataset, in comparison to a smaller dataset with a similar distribution. Results also indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models. We hypothesize that this outcome is a result of the group saliency representations instilling universal attention in the model, while the supervisory signal guides it to learn personalized attentional behaviors, providing the unified model a benefit over individual models due to its implicit representation of universal attention.
之前关于扫描路径预测的研究主要集中在群体模型上,而忽视了个体差异的存在。忽视这些差异对社交人机交互的影响尤为严重,因为机器人通常根据启发式或预定义的模式模仿人类的注视。然而,人类注视模式具有异质性,并且变化的行为可能显著影响此类人机交互的结果。为了填补这一空白,我们开发了一种基于深度学习的社交线索整合模型来进行显著性预测,而不是预测视频中的扫描路径。我们的模型通过通过门控机制和顺序注意来递归整合注意历史和社交线索来学习扫描路径。我们在动态社会场景的凝视数据集上评估了我们的方法。引入注意历史到我们的模型使得训练单个统一模型成为可能,而不是为每个扫描路径训练资源密集的模型。我们观察到,在训练模型的大型数据集上,晚期的神经整合方法超越了早期的融合方法,而在类似分布的小数据集上,情况则相反。结果还表明,对于所有观察者的扫描路径进行联合训练的单统一模型,其表现与单独训练的模型相当或者更好。我们假设,这种结果是因为群体突出表现引起了模型对普遍注意的关注,而监督信号则指导其学习个性化的注意行为,使得统一模型相对于单独模型具有优势,因为它隐含地表示了普遍注意。
https://arxiv.org/abs/2405.02929
Deep learning-based image restoration methods have achieved promising performance. However, how to faithfully preserve the structure of the original image remains challenging. To address this challenge, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models the image restoration as an optimal transport (OT) problem for both unpaired and paired settings, integrating the transport residual as a unique degradation-specific cue for both the transport cost and the transport map. Specifically, we first formalize a Fourier residual-guided OT objective by incorporating the degradation-specific information of the residual into the transport cost. Based on the dual form of the OT formulation, we design the transport map as a two-pass RCOT map that comprises a base model and a refinement process, in which the transport residual is computed by the base model in the first pass and then encoded as a degradation-specific embedding to condition the second-pass restoration. By duality, the RCOT problem is transformed into a minimax optimization problem, which can be solved by adversarially training neural networks. Extensive experiments on multiple restoration tasks show the effectiveness of our approach in terms of both distortion measures and perceptual quality. Particularly, RCOT restores images with more faithful structural details compared to state-of-the-art methods.
基于深度学习的图像修复方法已经取得了很好的性能。然而,如何忠实保留原始图像的结构仍然具有挑战性。为解决这个问题,我们提出了一个新颖的残差约束优化传输(RCOT)方法,将图像修复建模为对于未配对和成对设置的优化传输(OT)问题,将传输残差作为传输成本和传输映射的唯一退化特定提示。具体来说,我们首先通过将残差的退化特定信息融入传输成本中,形式化了一个Fourier残差引导的OT目标。基于OT公式的双形式,我们设计了一个包含基模型和优化过程的两层RCOT映射,其中传输残差在第一层由基模型计算,然后用退化特定编码作为第二层修复的调节。通过极值,RCOT问题转化为一个最小最大优化问题,可以被对抗性训练的神经网络求解。在多个修复任务上进行的大量实验证明了我们方法在失真度和感知质量方面的有效性。特别是,RCOT修复的图像具有比现有方法更忠实于结构的细节。
https://arxiv.org/abs/2405.02843
Attempt to use convolutional neural network to achieve kinematic analysis of plane bar structure. Through 3dsMax animation software and OpenCV module, self-build image dataset of geometrically stable system and geometrically unstable system. we construct and train convolutional neural network model based on the TensorFlow and Keras deep learning platform framework. The model achieves 100% accuracy on the training set, validation set, and test set. The accuracy on the additional test set is 93.7%, indicating that convolutional neural network can learn and master the relevant knowledge of kinematic analysis of structural mechanics. In the future, the generalization ability of the model can be improved through the diversity of dataset, which has the potential to surpass human experts for complex structures. Convolutional neural network has certain practical value in the field of kinematic analysis of structural mechanics. Using visualization technology, we reveal how convolutional neural network learns and recognizes structural features. Using pre-trained VGG16 model for feature extraction and fine-tuning, we found that the generalization ability is inferior to the self-built model.
尝试使用卷积神经网络来实现平面梁结构的运动分析。通过3dsMax动画软件和OpenCV模块,基于TensorFlow和Keras深度学习平台框架构建和训练几何稳定系统和高不稳定性系统图像数据集。该模型在训练集、验证集和测试集上的准确度均为100%。在附加测试集上的准确度为93.7%,表明卷积神经网络可以学习和掌握相关结构力学运动分析的知识。通过数据集的多样性来提高模型的泛化能力,该模型在复杂结构上的表现有可能超过人类专家。在结构力学运动分析领域,卷积神经网络具有一定的实际价值。通过可视化技术,我们揭示了卷积神经网络学习和识别结构特征的过程。使用预训练的VGG16模型进行特征提取和微调,我们发现自建模型的泛化能力要强于预训练模型。
https://arxiv.org/abs/2405.02807
This paper introduces a novel approach for enhanced lane detection by integrating spatial, angular, and temporal information through light field imaging and novel deep learning models. Utilizing lenslet-inspired 2D light field representations and LSTM networks, our method significantly improves lane detection in challenging conditions. We demonstrate the efficacy of this approach with modified CNN architectures, showing superior per- formance over traditional methods. Our findings suggest this integrated data approach could advance lane detection technologies and inspire new models that leverage these multidimensional insights for autonomous vehicle percep- tion.
本文提出了一种通过光场成像和一种新颖的深度学习模型将空间、角力和时间信息相结合的新方法,以增强道路检测。利用透镜阵列形式的2D光场表示和LSTM网络,我们的方法在具有挑战性的条件下显著提高了道路检测的精度。我们通过修改CNN架构来展示这种方法的有效性,并表明其显著优于传统方法。我们的研究结果表明,这种集成的数据方法可以推动道路检测技术的发展,并激发利用这些多维洞察力的新模型,以实现自动驾驶车辆感知。
https://arxiv.org/abs/2405.02792