Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms from O(n2) to O(log(n)). The model employs a single-vector, targeted attention mechanism to access stored information, and the memory block construction worker (summarizer) layer operates in two modes: a parallel execution mode during training for efficient processing of hierarchical tree structures and a sequential execution mode during inference, which acts as a memory management system. It also implicitly encodes positional information, eliminating the need for explicit positional encodings. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments, offering practical improvements in efficiency and scalability. The code is publicly available under the MIT License on GitHub: this https URL.
长序列建模是自然语言处理和时间序列分析中的一个关键方面。然而,传统的模型如循环神经网络(RNN)和变换器在处理长序列时会遇到计算效率低下和内存消耗过高的问题。本文介绍了一种新的架构——对数记忆网络(LMN),该架构利用了分层的对数树结构来高效存储和检索过去的信息。LMNs能够动态地总结历史背景,显著减少了注意力机制的记忆占用量和计算复杂度,从O(n^2)降至O(log(n))。模型采用单向量、目标注意机制来访问存储信息,并且记忆块构建工人(摘要生成器)层在两种模式下运行:一种是在训练期间用于高效处理分层树结构的并行执行模式;另一种是推理时作为内存管理系统工作的顺序执行模式。此外,LMNs还隐式地编码位置信息,从而消除了对显式位置编码的需求。这些特性使LMN成为资源受限环境中处理长序列的有效和可扩展解决方案,提供了在效率和可扩展性方面的实际改进。该代码以MIT许可证的形式公开发布在GitHub上:[此链接](请将括号内的文本替换为实际的URL)。
https://arxiv.org/abs/2501.07905
Parkinson's Disease (PD) is a degenerative neurological disorder that impairs motor and non-motor functions, significantly reducing quality of life and increasing mortality risk. Early and accurate detection of PD progression is vital for effective management and improved patient outcomes. Current diagnostic methods, however, are often costly, time-consuming, and require specialized equipment and expertise. This work proposes an innovative approach to predicting PD progression using regression methods, Long Short-Term Memory (LSTM) networks, and Kolmogorov Arnold Networks (KAN). KAN, utilizing spline-parametrized univariate functions, allows for dynamic learning of activation patterns, unlike traditional linear models. The Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is a comprehensive tool for evaluating PD symptoms and is commonly used to measure disease progression. Additionally, protein or peptide abnormalities are linked to PD onset and progression. Identifying these associations can aid in predicting disease progression and understanding molecular changes. Comparing multiple models, including LSTM and KAN, this study aims to identify the method that delivers the highest metrics. The analysis reveals that KAN, with its dynamic learning capabilities, outperforms other approaches in predicting PD progression. This research highlights the potential of AI and machine learning in healthcare, paving the way for advanced computational models to enhance clinical predictions and improve patient care and treatment strategies in PD management.
帕金森病(PD)是一种退行性神经性疾病,会影响运动和非运动功能,显著降低患者的生活质量,并增加死亡风险。早期且准确地检测帕金森病的进展对于有效管理和改善患者的预后至关重要。然而,目前的诊断方法往往成本高昂、耗时长,还需要专业的设备和技术支持。本研究提出了一种使用回归方法、长短时记忆网络(LSTM)和科洛莫哥罗夫-阿诺德网络(KAN)来预测帕金森病进展的创新途径。 KAN利用分段参数化的单变量函数,能够动态学习激活模式,这与传统的线性模型不同。《运动障碍学会赞助的统一帕金森病评定量表修订版》(MDS-UPDRS)是评估帕金森症状和测量疾病进展的一个全面工具。此外,蛋白质或肽的异常变化与帕金森病的发生和发展有关联。识别这些关联有助于预测疾病的进展,并理解分子层面的变化。 本研究比较了多种模型,包括LSTM和KAN,旨在确定哪种方法能够提供最高的性能指标。分析结果显示,具有动态学习能力的KAN在预测帕金森病进展方面优于其他方法。这项研究表明人工智能和机器学习在医疗保健领域的潜力,为临床预测和改善患者护理及治疗策略提供了先进计算模型的发展方向。
https://arxiv.org/abs/2412.20744
Accurate tool wear prediction is essential for maintaining productivity and minimizing costs in machining. However, the complex nature of the tool wear process poses significant challenges to achieving reliable predictions. This study explores data-driven methods, in particular deep learning, for tool wear prediction. Traditional data-driven approaches often focus on a single process, relying on multi-sensor setups and extensive data generation, which limits generalization to new settings. Moreover, multi-sensor integration is often impractical in industrial environments. To address these limitations, this research investigates the transferability of predictive models using minimal training data, validated across two processes. Furthermore, it uses a simple setup with a single acceleration sensor to establish a low-cost data generation approach that facilitates the generalization of models to other processes via transfer learning. The study evaluates several machine learning models, including convolutional neural networks (CNN), long short-term memory networks (LSTM), support vector machines (SVM) and decision trees, trained on different input formats such as feature vectors and short-time Fourier transform (STFT). The performance of the models is evaluated on different amounts of training data, including scenarios with significantly reduced datasets, providing insight into their effectiveness under constrained data conditions. The results demonstrate the potential of specific models and configurations for effective tool wear prediction, contributing to the development of more adaptable and efficient predictive maintenance strategies in machining. Notably, the ConvNeXt model has an exceptional performance, achieving an 99.1% accuracy in identifying tool wear using data from only four milling tools operated until they are worn.
准确的刀具磨损预测对于保持生产效率和降低成本至关重要。然而,刀具磨损过程的复杂性给实现可靠预测带来了重大挑战。本研究探索了基于数据驱动的方法,尤其是深度学习技术,在刀具磨损预测中的应用。传统数据驱动方法通常集中于单一加工过程,并依赖多传感器设置及大量数据生成,这限制了其在新环境下的泛化能力。此外,多传感器整合在工业环境中往往难以实现。为解决这些局限性,本研究探讨了使用少量训练数据的预测模型可移植性的验证,并通过两个不同加工过程进行了测试。同时,该研究采用了一种简单的单加速度传感器设置来建立低成本的数据生成方法,通过迁移学习促进模型向其他加工过程泛化。 在研究中,评估了几种机器学习模型的表现,包括卷积神经网络(CNN)、长短时记忆网络(LSTM)、支持向量机(SVM)和决策树,它们分别基于不同的输入格式训练,如特征向量和短时间傅里叶变换(STFT)。通过不同规模的训练数据集对这些模型进行了评估,其中包括大量减少的数据集场景,从而展示了其在受限数据条件下的有效性。研究结果表明了某些特定模型及其配置对于有效刀具磨损预测具有潜力,并促进了更灵活高效的预测性维护策略的发展。 值得注意的是,ConvNeXt模型表现出色,在仅使用四把铣刀操作直到完全磨损产生的少量数据的情况下,达到了99.1%的准确性,成功地识别出了刀具磨损。
https://arxiv.org/abs/2412.19950
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier. It utilizes the latest techniques in object detection and tracking, including Convolutional Neural Networks (CNNs) like YOLO, SSD, and Faster R-CNN, as well as Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs), to achieve high accuracy in detection and capture temporal dependencies. The approach incorporates adaptive background modeling through Gaussian Mixture Models (GMM) and optical flow methods like Lucas-Kanade to detect motions. Multi-scale and contextual analysis are used to improve detection across different object sizes and environments. A hybrid motion segmentation strategy combines statistical and deep learning models to manage complex movements, while optimizations for real-time processing ensure efficient computation. Tracking methods, such as Kalman Filters and Siamese networks, are employed to maintain smooth tracking even in cases of occlusion. Detection is improved on various-sized objects for multiple scenarios by multi-scale and contextual analysis. Results demonstrate high precision and recall in detecting and tracking objects, with significant improvements in processing times and accuracy due to real-time optimizations and illumination-invariant features. The impact of this research lies in its potential to transform video surveillance, reducing storage requirements and enhancing security through reliable and efficient object detection and tracking.
该项目旨在开发一个强大的视频监控系统,该系统可以根据活动检测将视频分割成较小的片段。例如,它使用CCTV录像记录重大事件,如人员或窃贼出现,从而优化存储并使数字搜索更容易。该项目采用了最新的对象检测和跟踪技术,包括卷积神经网络(CNNs)如YOLO、SSD 和 Faster R-CNN 以及递归神经网络(RNNs)和长短时记忆网络(LSTMs),以实现高精度的检测,并捕捉时间依赖关系。该方法通过高斯混合模型(GMM)和光流法如Lucas-Kanade来实现自适应背景建模,用于检测运动。多尺度和上下文分析被用来改善不同尺寸对象和环境下的检测效果。一种结合统计模型和深度学习模型的混合运动分割策略处理复杂运动,而实时处理优化确保了高效的计算性能。使用卡尔曼滤波器和Siamese网络等跟踪方法,即使在遮挡情况下也能保持平滑的追踪。多尺度和上下文分析改进了各种尺寸对象在多个场景中的检测效果。实验结果表明,在检测和跟踪物体方面实现了高精度和召回率,并且由于实时优化和光照不变特征,处理时间和准确性有了显著提升。这项研究的影响在于其潜在能力可以改变视频监控领域,减少存储需求并通过可靠、高效的对象检测与追踪来增强安全性。
https://arxiv.org/abs/2412.05331
Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.
污水处理厂在过程控制方面面临独特的挑战,因为它们具有复杂的动态特性、缓慢的时间常数以及观测和操作中的随机延迟。这些特点使得传统控制方法(如比例-积分-微分控制器)难以实现高效的磷去除,而磷的去除是确保环境可持续性的重要组成部分。本研究采用了一种基于Soft Actor-Critic算法的新颖深度强化学习方法来应对这些挑战,并结合了一个定制模拟器设计以模型化污水处理厂固有的延迟反馈。该模拟器整合了长短时记忆网络,用于准确预测多步状态,从而实现逼真的训练场景。为了考虑延迟的随机性质,在没有延迟、固定延迟和随机延迟三种延迟情景下进行了代理训练。结果表明,将随机延迟纳入强化学习框架显著提高了磷去除效率并降低了运营成本。具体而言,在模拟环境中,延迟感知代理相比传统控制方法实现了36%的磷排放减少,55%更高的奖励,77%的目标偏离监管限值更小,以及9%的总成本降低。这些发现强调了强化学习在克服污水处理中传统控制策略局限性方面的潜力,为磷去除提供了一种适应性强且经济高效的解决方案。
https://arxiv.org/abs/2411.18305
Phishing is one of the most effective ways in which cybercriminals get sensitive details such as credentials for online banking, digital wallets, state secrets, and many more from potential victims. They do this by spamming users with malicious URLs with the sole purpose of tricking them into divulging sensitive information which is later used for various cybercrimes. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques to expose their vulnerabilities and future research direction. For better analysis and observation, we split machine learning techniques into Bayesian, non-Bayesian, and deep learning. We reviewed the most recent advances in Bayesian and non-Bayesian-based classifiers before exploiting their corresponding weaknesses to indicate future research direction. While exploiting weaknesses in both Bayesian and non-Bayesian classifiers, we also compared each performance with a deep learning classifier. For a proper review of deep learning-based classifiers, we looked at Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Long Short Term Memory Networks (LSTMs). We did an empirical analysis to evaluate the performance of each classifier along with many of the proposed state-of-the-art anti-phishing techniques to identify future research directions, we also made a series of proposals on how the performance of the under-performing algorithm can improved in addition to a two-stage prediction model
网络钓鱼是网络犯罪分子获取潜在受害者敏感信息(如在线银行、数字钱包、国家机密等的凭证)最有效的方式之一。他们通过向用户发送恶意URL来实现这一目的,其唯一目的是诱使用户泄露这些敏感信息,之后用于各种网络犯罪活动。在这项研究中,我们对当前最先进的机器学习和深度学习网络钓鱼检测技术进行了全面回顾,揭示了它们的漏洞以及未来的研究方向。为了更好地进行分析和观察,我们将机器学习技术分为贝叶斯、非贝叶斯及深度学习三类。 在本研究中,我们在探讨其对应弱点之前,首先回顾了基于贝叶斯和非贝叶斯分类器的最新进展,并利用这些弱点指出了未来的研究方向。当探索贝叶斯和非贝叶斯分类器中的弱点时,我们还将每种分类器的表现与深度学习分类器进行了比较。 为了对基于深度学习的分类器进行适当的回顾,我们关注了循环神经网络(RNN)、卷积神经网络(CNN)以及长短期记忆网络(LSTMs)。我们还进行了实证分析来评估每个分类器及许多提议的最先进的反钓鱼技术的表现,以识别未来的研究方向。此外,我们也提出了一系列如何改进表现不佳算法性能的方法,并提出了一个两阶段预测模型。
https://arxiv.org/abs/2411.16751
Firing rate models are dynamical systems widely used in applied and theoretical neuroscience to describe local cortical dynamics in neuronal populations. By providing a macroscopic perspective of neuronal activity, these models are essential for investigating oscillatory phenomena, chaotic behavior, and associative memory processes. Despite their widespread use, the application of firing rate models to associative memory networks has received limited mathematical exploration, and most existing studies are focused on specific models. Conversely, well-established associative memory designs, such as Hopfield networks, lack key biologically-relevant features intrinsic to firing rate models, including positivity and interpretable synaptic matrices that reflect excitatory and inhibitory interactions. To address this gap, we propose a general framework that ensures the emergence of re-scaled memory patterns as stable equilibria in the firing rate dynamics. Furthermore, we analyze the conditions under which the memories are locally and globally asymptotically stable, providing insights into constructing biologically-plausible and robust systems for associative memory retrieval.
发放率模型是广泛应用于应用神经科学和理论神经科学的动力系统,用于描述皮层局部的神经元群体动力学。通过提供宏观视角来观察神经活动,这些模型对于研究振荡现象、混沌行为及联想记忆过程至关重要。尽管这些模型被广泛应用,但将发放率模型应用于联想记忆网络的数学探索却相对有限,大部分现有研究集中在特定模型上。相反,一些已建立的联想记忆设计,如霍普菲尔德网络,缺乏与发放率模型固有的生物学相关特征,比如正性以及能够解释兴奋性和抑制性相互作用的可解读突触矩阵。为解决这一差距,我们提出了一种通用框架,确保重新缩放的记忆模式在发放率动态中作为稳定的平衡点出现。此外,我们分析了记忆局部和全局渐近稳定的条件,从而提供构建生物学上合理且稳健的联想记忆检索系统的见解。
https://arxiv.org/abs/2411.07388
Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is currently a tedious process and inaccurate. Conventional whole sequence segmentation approaches first estimate the motion field between frames, which is then used to propagate the mask along the temporal axis. However, the mask propagation results could be prone to error, especially for the basal and apex slices, where through-plane motion leads to significant morphology and structural change during the cardiac cycle. Inspired by recent advances in video object segmentation (VOS), based on spatio-temporal memory (STM) networks, we propose a continuous STM (CSTM) network for semi-supervised whole heart and whole sequence cMR segmentation. Our CSTM network takes full advantage of the spatial, scale, temporal and through-plane continuity prior of the underlying heart anatomy structures, to achieve accurate and fast 4D segmentation. Results of extensive experiments across multiple cMR datasets show that our method can improve the 4D cMR segmentation performance, especially for the hard-to-segment regions.
当前的心脏动态磁共振成像(cMR)研究主要关注舒张末期(ED)和收缩末期(ES)的相位,而忽略了整个图像序列中丰富的时序信息。这是由于目前对整个序列进行分割是一个繁琐且不准确的过程。传统的方法首先估计帧之间的运动场,然后使用该运动场沿时间轴传播掩模。然而,这种掩模传播的结果容易出错,尤其是在基底和心尖切片上,因平面外的运动会导致心脏周期中的形态和结构发生显著变化。受到基于时空记忆(STM)网络在视频对象分割(VOS)方面最新进展的启发,我们提出了一种连续时空记忆(CSTM)网络用于半监督下的整个心脏及整段序列cMR分割。我们的CSTM网络充分利用了潜在心肌解剖结构的空间、尺度、时序和平面外延续性先验知识,以实现准确且快速的4D分割。在多个cMR数据集上的广泛实验结果表明,我们提出的方法能够提高4D cMR分割性能,特别是对于难以分割的区域有显著改善。
https://arxiv.org/abs/2410.23191
Personality analysis from online short videos has gained prominence due to its applications in personalized recommendation systems, sentiment analysis, and human-computer interaction. Traditional assessment methods, such as questionnaires based on the Big Five Personality Framework, are limited by self-report biases and are impractical for large-scale or real-time analysis. Leveraging the rich, multi-modal data present in short videos offers a promising alternative for more accurate personality inference. However, integrating these diverse and asynchronous modalities poses significant challenges, particularly in aligning time-varying data and ensuring models generalize well to new domains with limited labeled data. In this paper, we propose a novel multi-modal personality analysis framework that addresses these challenges by synchronizing and integrating features from multiple modalities and enhancing model generalization through domain adaptation. We introduce a timestamp-based modality alignment mechanism that synchronizes data based on spoken word timestamps, ensuring accurate correspondence across modalities and facilitating effective feature integration. To capture temporal dependencies and inter-modal interactions, we employ Bidirectional Long Short-Term Memory networks and self-attention mechanisms, allowing the model to focus on the most informative features for personality prediction. Furthermore, we develop a gradient-based domain adaptation method that transfers knowledge from multiple source domains to improve performance in target domains with scarce labeled data. Extensive experiments on real-world datasets demonstrate that our framework significantly outperforms existing methods in personality prediction tasks, highlighting its effectiveness in capturing complex behavioral cues and robustness in adapting to new domains.
从在线短视频中进行人格分析因其在个性化推荐系统、情感分析和人机交互中的应用而变得越来越重要。传统的评估方法,如基于大五人格框架的问卷调查,受限于自我报告偏差,并且对于大规模或实时分析来说并不实际。利用短视频中存在的丰富多模态数据为更准确的人格推断提供了有希望的替代方案。然而,整合这些多样性和异步模式带来了重大挑战,特别是在对齐随时间变化的数据和确保模型在新领域中有限标注数据的情况下能够很好地泛化方面。本文提出了一种新颖的多模态人格分析框架,通过同步和集成来自多个模态的功能,并通过域适应增强模型的泛化能力来解决这些挑战。我们引入了基于时间戳的模式对齐机制,根据所说单词的时间戳同步数据,确保跨模态之间的准确对应并促进有效特征整合。为了捕捉时序依赖性和多模态交互,我们采用双向长短时记忆网络和自注意力机制,使模型能够专注于人格预测中最具信息量的特征。此外,我们开发了一种基于梯度的领域适应方法,将多个源域的知识转移到目标域以改善在标注数据稀缺的情况下的性能。在真实世界数据集上的广泛实验表明,我们的框架在人格预测任务中显著优于现有方法,突显了其捕捉复杂行为线索和适应新领域的稳健性。
https://arxiv.org/abs/2411.00813
Reproducibility in scientific research, particularly within the realm of natural language processing (NLP), is essential for validating and verifying the robustness of experimental findings. This paper delves into the reproduction and evaluation of dialogue summarization models, focusing specifically on the discrepancies observed between original studies and our reproduction efforts. Dialogue summarization is a critical aspect of NLP, aiming to condense conversational content into concise and informative summaries, thus aiding in efficient information retrieval and decision-making processes. Our research involved a thorough examination of several dialogue summarization models using the AMI (Augmented Multi-party Interaction) dataset. The models assessed include Hierarchical Memory Networks (HMNet) and various versions of Pointer-Generator Networks (PGN), namely PGN(DKE), PGN(DRD), PGN(DTS), and PGN(DALL). The primary objective was to evaluate the informativeness and quality of the summaries generated by these models through human assessment, a method that introduces subjectivity and variability in the evaluation process. The analysis began with Dataset 1, where the sample standard deviation of 0.656 indicated a moderate dispersion of data points around the mean.
https://arxiv.org/abs/2410.15962
We developed Long Short-Term Memory (LSTM) models to predict the formation of active regions (ARs) on the solar surface. Using the Doppler shift velocity, the continuum intensity, and the magnetic field observations from the Solar Dynamics Observatory (SDO) Helioseismic and Magnetic Imager (HMI), we have created time-series datasets of acoustic power and magnetic flux, which are used to train LSTM models on predicting continuum intensity, 12 hours in advance. These novel machine learning (ML) models are able to capture variations of the acoustic power density associated with upcoming magnetic flux emergence and continuum intensity decrease. Testing of the models' performance was done on data for 5 ARs, unseen from the models during training. Model 8, the best performing model trained, was able to make a successful prediction of emergence for all testing active regions in an experimental setting and three of them in an operational. The model predicted the emergence of AR11726, AR13165, and AR13179 respectively 10, 29, and 5 hours in advance, and variations of this model achieved average RMSE values of 0.11 for both active and quiet areas on the solar disc. This work sets the foundations for ML-aided prediction of solar ARs.
我们开发了 Long Short-Term Memory (LSTM) 模型,用于预测太阳表面的 active regions(ARs)的形成。通过使用 Solar Dynamics Observatory (SDO) 的 Helioseismic and Magnetic Imager (HMI) 的多普勒位移速度、连续强度和磁场观测数据,我们创建了音频功率和磁通量的时间序列数据集,这些数据被用于在预测 12 小时前的连续强度。这些新颖的机器学习(ML)模型能够捕捉到即将出现的磁通量爆发和连续强度降低与音频功率密度相关的变化。对模型性能的测试在训练数据之外的数据上进行。表现最好的模型 8 能够在一个实验设置中成功预测所有测试的 active regions 的爆发,而在操作中则预测了三个 active regions 的爆发。该模型预测 AR11726、AR13165 和 AR13179 分别提前 10、29 和 5 小时,该模型的平均 RMSE 值分别为 0.11,对于太阳盘上的活跃和静止区域。这项工作为使用 ML 辅助预测 solar ARs 奠定了基础。
https://arxiv.org/abs/2409.17421
Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.
内容可寻址的记忆,如现代联想记忆(MHN)已被作为人脑声明性记忆的自动关联和存储/检索数学模型进行研究。然而,在大规模内容存储方面,它们的实际应用面临着挑战。其中最突出的挑战是出现元稳定状态,特别是在处理大量高维内容时。本文介绍了一个名为Hopfield编码网络(HEN)的框架,将编码的神经表示整合到MHN中,以提高模式分离度和减少元稳定状态。我们证明了HEN也可以用于图像与自然语言查询的异质关联检索,从而消除了在同一领域内需要访问部分内容的限制。实验结果表明,在保持记忆容量的同时,元稳定状态的数量大幅减少,存储容量也得到了增加。这将有助于实现为现实世界任务提供更大实用价值的联想记忆网络。
https://arxiv.org/abs/2409.16408
Tracing a student's knowledge growth given the past exercise answering is a vital objective in automatic tutoring systems to customize the learning experience. Yet, achieving this objective is a non-trivial task as it involves modeling the knowledge state across multiple knowledge components (KCs) while considering their temporal and relational dynamics during the learning process. Knowledge tracing methods have tackled this task by either modeling KCs' temporal dynamics using recurrent models or relational dynamics across KCs and questions using graph models. Albeit, there is a lack of methods that could learn joint embedding between relational and temporal dynamics of the task. Moreover, many methods that count for the impact of a student's forgetting behavior during the learning process use hand-crafted features, limiting their generalization on different scenarios. In this paper, we propose a novel method that jointly models the relational and temporal dynamics of the knowledge state using a deep temporal graph memory network. In addition, we propose a generic technique for representing a student's forgetting behavior using temporal decay constraints on the graph memory module. We demonstrate the effectiveness of our proposed method using multiple knowledge tracing benchmarks while comparing it to state-of-the-art methods.
在自动教学系统中,追踪学生的知识增长是一个至关重要的目标,以便定制学习体验。然而,实现这个目标并不是一个轻松的任务,因为它涉及到在学习过程中建模多个知识组件(KCs)之间的知识状态的时空和关系动态。知识跟踪方法通过使用循环模型或图模型来建模KCs的时空动态或KCs和问题之间的关系动态来解决这个任务。尽管如此,尚缺乏学习任务之间联合嵌入的方法。此外,许多影响学生遗忘行为的因素计数方法使用手动的特征,限制了它们在不同情景下的泛化能力。在本文中,我们提出了一种新颖的方法,使用深度时间图记忆网络共同建模知识状态的相对和时间动态。此外,我们还提出了一种代表学生遗忘行为的图形记忆模块上的时间衰减约束的通用技术。我们通过使用多个知识跟踪基准来证明我们提出的方法的有效性,并将其与最先进的方法进行比较。
https://arxiv.org/abs/2410.01836
The field of artificial intelligence faces significant challenges in achieving both biological plausibility and computational efficiency, particularly in visual learning tasks. Current artificial neural networks, such as convolutional neural networks, rely on techniques like backpropagation and weight sharing, which do not align with the brain's natural information processing methods. To address these issues, we propose the Memory Network, a model inspired by biological principles that avoids backpropagation and convolutions, and operates in a single pass. This approach enables rapid and efficient learning, mimicking the brain's ability to adapt quickly with minimal exposure to data. Our experiments demonstrate that the Memory Network achieves efficient and biologically plausible learning, showing strong performance on simpler datasets like MNIST. However, further refinement is needed for the model to handle more complex datasets such as CIFAR10, highlighting the need to develop new algorithms and techniques that closely align with biological processes while maintaining computational efficiency.
人工智能领域在实现生物可信度和计算效率方面面临着重大挑战,特别是在视觉学习任务中。当前的人工神经网络,如卷积神经网络,依赖于诸如反向传播和权重共享等技术,这些技术与大脑的自然信息处理方法不协调。为解决这些问题,我们提出了Memory Network,一种受生物原理启发的设计,避免了反向传播和卷积操作,且在单次通过过程中操作。这种方法能够实现快速和高效的训练,模仿大脑在 minimal exposure to data 情况下快速适应的能力。我们的实验结果表明,Memory Network在简单数据集上实现了高效的生物可信的学习,在MNIST等数据集上表现出色。然而,为了使模型能够处理更复杂的数据集,如CIFAR10,还需要进一步优化,强调在保持计算效率的同时,需要开发新的算法和技术,紧密 align with biological processes。
https://arxiv.org/abs/2409.17282
Electroencephalography (EEG) signals are crucial for investigating brain function and cognitive processes. This study aims to address the challenges of efficiently recording and analyzing high-dimensional EEG signals while listening to music to recognize emotional states. We propose a method combining Bidirectional Long Short-Term Memory (Bi-LSTM) networks with attention mechanisms for EEG signal processing. Using wearable EEG devices, we collected brain activity data from participants listening to music. The data was preprocessed, segmented, and Differential Entropy (DE) features were extracted. We then constructed and trained a Bi-LSTM model to enhance key feature extraction and improve emotion recognition accuracy. Experiments were conducted on the SEED and DEAP datasets. The Bi-LSTM-AttGW model achieved 98.28% accuracy on the SEED dataset and 92.46% on the DEAP dataset in multi-class emotion recognition tasks, significantly outperforming traditional models such as SVM and EEG-Net. This study demonstrates the effectiveness of combining Bi-LSTM with attention mechanisms, providing robust technical support for applications in brain-computer interfaces (BCI) and affective computing. Future work will focus on improving device design, incorporating multimodal data, and further enhancing emotion recognition accuracy, aiming to achieve practical applications in real-world scenarios.
脑电图(EEG)信号对于研究大脑功能和认知过程至关重要。本研究旨在解决在听音乐时高效记录和分析高维EEG信号的挑战,以识别情感状态。我们提出了一种结合双向长短期记忆(Bi-LSTM)网络与注意机制的EEG信号处理方法。使用可穿戴式EEG设备从参与者听音乐的活动中收集脑活动数据。数据进行预处理、分割和提取差异熵(DE)特征。然后我们构建并训练了Bi-LSTM模型,以提高关键特征提取并提高情感识别准确性。实验在SEED和DEAP数据集上进行。Bi-LSTM-AttGW模型在多类情感识别任务中SEED数据集上的准确率为98.28%,DEAP数据集上的准确率为92.46%,显著优于传统模型(如SVM和EEG-Net)。本研究证明了将Bi-LSTM与注意力机制相结合的有效性,为脑-计算机接口(BCI)和情感计算提供了稳健的技术支持。未来工作将关注设备设计的改进、多模态数据的引入以及进一步提高情感识别准确性,以实现现实场景中的实际应用。
https://arxiv.org/abs/2408.12124
In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the impact of the target information on the prediction. To address these issues, we propose a Pattern-Matching Dynamic Memory Network (PM-DMNet). PM-DMNet employs a novel dynamic memory network to capture traffic pattern features with only O(N) complexity, significantly reducing computational overhead while achieving excellent performance. The PM-DMNet also introduces two prediction methods: Recursive Multi-step Prediction (RMP) and Parallel Multi-step Prediction (PMP), which leverage the time features of the prediction targets to assist in the forecasting process. Furthermore, a transfer attention mechanism is integrated into PMP, transforming historical data features to better align with the predicted target states, thereby capturing trend changes more accurately and reducing errors. Extensive experiments demonstrate the superiority of the proposed model over existing benchmarks. The source codes are available at: this https URL.
近年来,在交通预测领域,深度学习越来越受到关注。现有的交通预测模型通常依赖于 GCNs 或具有 O(N^2) 复杂性的注意力机制来动态提取交通节点特征,这些模型缺乏效率并且不是轻量级的。此外,这些模型通常仅利用历史数据进行预测,而没有考虑目标信息对预测的影响。为了解决这些问题,我们提出了一个模式匹配动态内存网络(PM-DMNet)。PM-DMNet 采用了一种新颖的动态内存网络来捕获交通模式特征,其复杂度仅为 O(N),从而大大减少了计算开销,同时实现了卓越的性能。PM-DMNet 还引入了两种预测方法:递归多步预测(RMP)和并行多步预测(PMP),它们利用预测目标的时序特征来协助进行预测过程。此外,PMP 中还引入了转移注意力机制,将历史数据特征更好地对预测目标的状态进行对齐,从而更准确地捕捉趋势变化并减少错误。大量实验证明,与现有基准相比,所提出的模型具有优越性。源代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2408.07100
Autonomous robots consistently encounter unforeseen dangerous situations during exploration missions. The characteristic rimless wheels in the AsguardIV rover allow it to overcome challenging terrains. However, steep slopes or difficult maneuvers can cause the rover to tip over and threaten the completion of a mission. This work focuses on identifying early signs or initial stages for potential tip-over events to predict and detect these critical moments before they fully occur, possibly preventing accidents and enhancing the safety and stability of the rover during its exploration mission. Inertial Measurement Units (IMU) readings are used to develop compact, robust, and efficient Autoencoders that combine the power of sequence processing of Long Short-Term Memory Networks (LSTM). By leveraging LSTM-based Autoencoders, this work contributes predictive capabilities for detecting tip-over risks and developing safety measures for more reliable exploration missions.
自主机器人 exploration任务过程中会持续遭遇未曾预料的危险情况。AsguardIV 漫游者配备了无轮缘的凸轮,这使得它能够克服困难的 terrain。然而,陡峭的山坡或困难的操作可能导致机器人倾斜,危及整个任务的完成。本工作关注于潜在倾翻事件的早期迹象或初始阶段,以预测和检测这些关键时刻,从而在事件完全发生之前可能预防事故,并增强其在探索任务中的安全和稳定性。通过使用惯性测量单元(IMU)读数,本工作开发了紧凑、稳健、高效的 Autoencoders,结合了长短期记忆网络(LSTM)的序列处理能力。通过利用基于 LSTM 的 Autoencoders,这项工作为检测倾翻风险和为更可靠的探索任务开发安全措施做出了贡献。
https://arxiv.org/abs/2408.05602
This paper explores an improved Adaboost algorithm based on Long Short-Term Memory Networks (LSTMs), which aims to improve the prediction accuracy of user clicks on web page advertisements. By comparing it with several common machine learning algorithms, the paper analyses the advantages of the new model in ad click prediction. It is shown that the improved algorithm proposed in this paper performs well in user ad click prediction with an accuracy of 92%, which is an improvement of 13.6% compared to the highest of 78.4% among the other three base models. This significant improvement indicates that the algorithm is more capable of capturing user behavioural characteristics and time series patterns. In addition, this paper evaluates the model's performance on other performance metrics, including accuracy, recall, and F1 score. The results show that the improved Adaboost algorithm based on LSTM is significantly ahead of the traditional model in all these metrics, which further validates its effectiveness and superiority. Especially when facing complex and dynamically changing user behaviours, the model is able to better adapt and make accurate predictions. In order to ensure the practicality and reliability of the model, this study also focuses on the accuracy difference between the training set and the test set. After validation, the accuracy of the proposed model on these two datasets only differs by 1.7%, which is a small difference indicating that the model has good generalisation ability and can be effectively applied to real-world scenarios.
本文探讨了一种基于长短期记忆网络(LSTMs)的改进Adaboost算法,旨在提高用户在网页广告上的点击预测准确性。通过与几种常见的机器学习算法进行比较,本文分析了新模型在广告点击预测方面的优势。结果表明,与三种基本模型中的最高准确性相比,本文提出的改进算法在用户广告点击预测方面的准确性为92%,这是一个比其他三种模型高出13.6%的显著改进。这一显著的改进表明,该算法在更准确地捕捉用户行为特征和时间序列模式方面表现更出色。此外,本文还评估了模型的性能,包括准确率、召回率和F1分数。结果显示,基于LSTM的改进Adaboost算法在所有这些指标上都显著领先于传统模型,进一步证实了其有效性和优越性。特别是在面对复杂且动态变化的用户行为时,该模型能够更好地适应并做出准确的预测。为了确保模型的实用性和可靠性,本研究还重点关注了训练集和测试集之间的准确性差异。在验证之后,该模型在这些两个数据集上的准确性只相差1.7%,这是一个小的差异,表明该模型具有良好的泛化能力和实际应用价值。
https://arxiv.org/abs/2408.05245
Before the advent of fault-tolerant quantum computers, variational quantum algorithms (VQAs) play a crucial role in noisy intermediate-scale quantum (NISQ) machines. Conventionally, the optimization of VQAs predominantly relies on manually designed optimizers. However, learning to optimize (L2O) demonstrates impressive performance by training small neural networks to replace handcrafted optimizers. In our work, we propose L2O-$g^{\dagger}$, a $\textit{quantum-aware}$ learned optimizer that leverages the Fubini-Study metric tensor ($g^{\dagger}$) and long short-term memory networks. We theoretically derive the update equation inspired by the lookahead optimizer and incorporate the quantum geometry of the optimization landscape in the learned optimizer to balance fast convergence and generalization. Empirically, we conduct comprehensive experiments across a range of VQA problems. Our results demonstrate that L2O-$g^{\dagger}$ not only outperforms the current SOTA hand-designed optimizer without any hyperparameter tuning but also shows strong out-of-distribution generalization compared to previous L2O optimizers. We achieve this by training L2O-$g^{\dagger}$ on just a single generic PQC instance. Our novel $\textit{quantum-aware}$ learned optimizer, L2O-$g^{\dagger}$, presents an advancement in addressing the challenges of VQAs, making it a valuable tool in the NISQ era.
在容错量子计算机问世之前,变分量子算法(VQAs)在噪声中间规模量子(NISQ)机器中扮演着关键角色。通常,VQA的优化主要依赖于人工设计的优化器。然而,通过训练小神经网络替换手工设计的优化器,学习优化(L2O)展示了令人印象深刻的性能。在我们的工作中,我们提出了L2O-g*,一个量子感知的学习优化器,它依赖于Fubini-Study度量张量(g*)和长短期记忆网络。我们通过理论推导得到了启发式查找优化器(lookahead optimizer)的更新方程,并把优化空间中的量子几何融入了学习优化器,以平衡快速收敛和泛化。在实验中,我们对一系列VQA问题进行了全面的分析。我们的结果表明,L2O-g*不仅超越了没有超参数调整的当前SOTA手工设计的优化器,而且与以前的L2O优化器相比显示出强大的离散泛化能力。我们通过在单个通用量子量子点量子计算机上训练L2O-g*实现了这一点。我们新颖的量子感知学习优化器L2O-g*,在解决VQA的挑战方面取得了进步,使得它成为NISQ时代的有价值工具。
https://arxiv.org/abs/2407.14761
This research introduces a novel anomaly detection method designed to enhance the operational reliability of particle accelerators - complex machines that accelerate elementary particles to high speeds for various scientific applications. Our approach utilizes a Long Short-Term Memory (LSTM) neural network to predict the temperature of key components within the magnet power supplies (PSs) of these accelerators, such as heatsinks, capacitors, and resistors, based on the electrical current flowing through the PS. Anomalies are declared when there is a significant discrepancy between the LSTM-predicted temperatures and actual observations. Leveraging a custom-built test stand, we conducted comprehensive performance comparisons with a less sophisticated method, while also fine-tuning hyperparameters of both methods. This process not only optimized the LSTM model but also unequivocally demonstrated the superior efficacy of this new proposed method. The dedicated test stand also facilitated exploratory work on more advanced strategies for monitoring interior PS temperatures using infrared cameras. A proof-of-concept example is provided.
这项研究介绍了一种新的异常检测方法,旨在增强粒子加速器的操作可靠性 - 这些加速器将基本粒子加速到高速,用于各种科学应用。我们的方法利用了一个Long Short-Term Memory(LSTM)神经网络,根据流经加速器磁电供应(PS)的电流预测磁电供应中关键部件(如散热器、电容器和电阻器)的温度。当LSTM预测的温度与实际观测到的温度之间存在显著差异时,宣布存在异常。利用自行设计的测试台,我们与更简单的方法进行了全面的性能比较,同时对两种方法的超参数进行了微调。这一过程不仅优化了LSTM模型,而且明确地证明了这种新方法的有效性。专门的测试台还促进了关于使用红外相机更高级别的内部PS温度监测策略的探索工作。提供一个概念证明示例。
https://arxiv.org/abs/2405.18321