We developed Long Short-Term Memory (LSTM) models to predict the formation of active regions (ARs) on the solar surface. Using the Doppler shift velocity, the continuum intensity, and the magnetic field observations from the Solar Dynamics Observatory (SDO) Helioseismic and Magnetic Imager (HMI), we have created time-series datasets of acoustic power and magnetic flux, which are used to train LSTM models on predicting continuum intensity, 12 hours in advance. These novel machine learning (ML) models are able to capture variations of the acoustic power density associated with upcoming magnetic flux emergence and continuum intensity decrease. Testing of the models' performance was done on data for 5 ARs, unseen from the models during training. Model 8, the best performing model trained, was able to make a successful prediction of emergence for all testing active regions in an experimental setting and three of them in an operational. The model predicted the emergence of AR11726, AR13165, and AR13179 respectively 10, 29, and 5 hours in advance, and variations of this model achieved average RMSE values of 0.11 for both active and quiet areas on the solar disc. This work sets the foundations for ML-aided prediction of solar ARs.
我们开发了 Long Short-Term Memory (LSTM) 模型,用于预测太阳表面的 active regions(ARs)的形成。通过使用 Solar Dynamics Observatory (SDO) 的 Helioseismic and Magnetic Imager (HMI) 的多普勒位移速度、连续强度和磁场观测数据,我们创建了音频功率和磁通量的时间序列数据集,这些数据被用于在预测 12 小时前的连续强度。这些新颖的机器学习(ML)模型能够捕捉到即将出现的磁通量爆发和连续强度降低与音频功率密度相关的变化。对模型性能的测试在训练数据之外的数据上进行。表现最好的模型 8 能够在一个实验设置中成功预测所有测试的 active regions 的爆发,而在操作中则预测了三个 active regions 的爆发。该模型预测 AR11726、AR13165 和 AR13179 分别提前 10、29 和 5 小时,该模型的平均 RMSE 值分别为 0.11,对于太阳盘上的活跃和静止区域。这项工作为使用 ML 辅助预测 solar ARs 奠定了基础。
https://arxiv.org/abs/2409.17421
Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.
内容可寻址的记忆,如现代联想记忆(MHN)已被作为人脑声明性记忆的自动关联和存储/检索数学模型进行研究。然而,在大规模内容存储方面,它们的实际应用面临着挑战。其中最突出的挑战是出现元稳定状态,特别是在处理大量高维内容时。本文介绍了一个名为Hopfield编码网络(HEN)的框架,将编码的神经表示整合到MHN中,以提高模式分离度和减少元稳定状态。我们证明了HEN也可以用于图像与自然语言查询的异质关联检索,从而消除了在同一领域内需要访问部分内容的限制。实验结果表明,在保持记忆容量的同时,元稳定状态的数量大幅减少,存储容量也得到了增加。这将有助于实现为现实世界任务提供更大实用价值的联想记忆网络。
https://arxiv.org/abs/2409.16408
Tracing a student's knowledge growth given the past exercise answering is a vital objective in automatic tutoring systems to customize the learning experience. Yet, achieving this objective is a non-trivial task as it involves modeling the knowledge state across multiple knowledge components (KCs) while considering their temporal and relational dynamics during the learning process. Knowledge tracing methods have tackled this task by either modeling KCs' temporal dynamics using recurrent models or relational dynamics across KCs and questions using graph models. Albeit, there is a lack of methods that could learn joint embedding between relational and temporal dynamics of the task. Moreover, many methods that count for the impact of a student's forgetting behavior during the learning process use hand-crafted features, limiting their generalization on different scenarios. In this paper, we propose a novel method that jointly models the relational and temporal dynamics of the knowledge state using a deep temporal graph memory network. In addition, we propose a generic technique for representing a student's forgetting behavior using temporal decay constraints on the graph memory module. We demonstrate the effectiveness of our proposed method using multiple knowledge tracing benchmarks while comparing it to state-of-the-art methods.
在自动教学系统中,追踪学生的知识增长是一个至关重要的目标,以便定制学习体验。然而,实现这个目标并不是一个轻松的任务,因为它涉及到在学习过程中建模多个知识组件(KCs)之间的知识状态的时空和关系动态。知识跟踪方法通过使用循环模型或图模型来建模KCs的时空动态或KCs和问题之间的关系动态来解决这个任务。尽管如此,尚缺乏学习任务之间联合嵌入的方法。此外,许多影响学生遗忘行为的因素计数方法使用手动的特征,限制了它们在不同情景下的泛化能力。在本文中,我们提出了一种新颖的方法,使用深度时间图记忆网络共同建模知识状态的相对和时间动态。此外,我们还提出了一种代表学生遗忘行为的图形记忆模块上的时间衰减约束的通用技术。我们通过使用多个知识跟踪基准来证明我们提出的方法的有效性,并将其与最先进的方法进行比较。
https://arxiv.org/abs/2410.01836
The field of artificial intelligence faces significant challenges in achieving both biological plausibility and computational efficiency, particularly in visual learning tasks. Current artificial neural networks, such as convolutional neural networks, rely on techniques like backpropagation and weight sharing, which do not align with the brain's natural information processing methods. To address these issues, we propose the Memory Network, a model inspired by biological principles that avoids backpropagation and convolutions, and operates in a single pass. This approach enables rapid and efficient learning, mimicking the brain's ability to adapt quickly with minimal exposure to data. Our experiments demonstrate that the Memory Network achieves efficient and biologically plausible learning, showing strong performance on simpler datasets like MNIST. However, further refinement is needed for the model to handle more complex datasets such as CIFAR10, highlighting the need to develop new algorithms and techniques that closely align with biological processes while maintaining computational efficiency.
人工智能领域在实现生物可信度和计算效率方面面临着重大挑战,特别是在视觉学习任务中。当前的人工神经网络,如卷积神经网络,依赖于诸如反向传播和权重共享等技术,这些技术与大脑的自然信息处理方法不协调。为解决这些问题,我们提出了Memory Network,一种受生物原理启发的设计,避免了反向传播和卷积操作,且在单次通过过程中操作。这种方法能够实现快速和高效的训练,模仿大脑在 minimal exposure to data 情况下快速适应的能力。我们的实验结果表明,Memory Network在简单数据集上实现了高效的生物可信的学习,在MNIST等数据集上表现出色。然而,为了使模型能够处理更复杂的数据集,如CIFAR10,还需要进一步优化,强调在保持计算效率的同时,需要开发新的算法和技术,紧密 align with biological processes。
https://arxiv.org/abs/2409.17282
Electroencephalography (EEG) signals are crucial for investigating brain function and cognitive processes. This study aims to address the challenges of efficiently recording and analyzing high-dimensional EEG signals while listening to music to recognize emotional states. We propose a method combining Bidirectional Long Short-Term Memory (Bi-LSTM) networks with attention mechanisms for EEG signal processing. Using wearable EEG devices, we collected brain activity data from participants listening to music. The data was preprocessed, segmented, and Differential Entropy (DE) features were extracted. We then constructed and trained a Bi-LSTM model to enhance key feature extraction and improve emotion recognition accuracy. Experiments were conducted on the SEED and DEAP datasets. The Bi-LSTM-AttGW model achieved 98.28% accuracy on the SEED dataset and 92.46% on the DEAP dataset in multi-class emotion recognition tasks, significantly outperforming traditional models such as SVM and EEG-Net. This study demonstrates the effectiveness of combining Bi-LSTM with attention mechanisms, providing robust technical support for applications in brain-computer interfaces (BCI) and affective computing. Future work will focus on improving device design, incorporating multimodal data, and further enhancing emotion recognition accuracy, aiming to achieve practical applications in real-world scenarios.
脑电图(EEG)信号对于研究大脑功能和认知过程至关重要。本研究旨在解决在听音乐时高效记录和分析高维EEG信号的挑战,以识别情感状态。我们提出了一种结合双向长短期记忆(Bi-LSTM)网络与注意机制的EEG信号处理方法。使用可穿戴式EEG设备从参与者听音乐的活动中收集脑活动数据。数据进行预处理、分割和提取差异熵(DE)特征。然后我们构建并训练了Bi-LSTM模型,以提高关键特征提取并提高情感识别准确性。实验在SEED和DEAP数据集上进行。Bi-LSTM-AttGW模型在多类情感识别任务中SEED数据集上的准确率为98.28%,DEAP数据集上的准确率为92.46%,显著优于传统模型(如SVM和EEG-Net)。本研究证明了将Bi-LSTM与注意力机制相结合的有效性,为脑-计算机接口(BCI)和情感计算提供了稳健的技术支持。未来工作将关注设备设计的改进、多模态数据的引入以及进一步提高情感识别准确性,以实现现实场景中的实际应用。
https://arxiv.org/abs/2408.12124
In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the impact of the target information on the prediction. To address these issues, we propose a Pattern-Matching Dynamic Memory Network (PM-DMNet). PM-DMNet employs a novel dynamic memory network to capture traffic pattern features with only O(N) complexity, significantly reducing computational overhead while achieving excellent performance. The PM-DMNet also introduces two prediction methods: Recursive Multi-step Prediction (RMP) and Parallel Multi-step Prediction (PMP), which leverage the time features of the prediction targets to assist in the forecasting process. Furthermore, a transfer attention mechanism is integrated into PMP, transforming historical data features to better align with the predicted target states, thereby capturing trend changes more accurately and reducing errors. Extensive experiments demonstrate the superiority of the proposed model over existing benchmarks. The source codes are available at: this https URL.
近年来,在交通预测领域,深度学习越来越受到关注。现有的交通预测模型通常依赖于 GCNs 或具有 O(N^2) 复杂性的注意力机制来动态提取交通节点特征,这些模型缺乏效率并且不是轻量级的。此外,这些模型通常仅利用历史数据进行预测,而没有考虑目标信息对预测的影响。为了解决这些问题,我们提出了一个模式匹配动态内存网络(PM-DMNet)。PM-DMNet 采用了一种新颖的动态内存网络来捕获交通模式特征,其复杂度仅为 O(N),从而大大减少了计算开销,同时实现了卓越的性能。PM-DMNet 还引入了两种预测方法:递归多步预测(RMP)和并行多步预测(PMP),它们利用预测目标的时序特征来协助进行预测过程。此外,PMP 中还引入了转移注意力机制,将历史数据特征更好地对预测目标的状态进行对齐,从而更准确地捕捉趋势变化并减少错误。大量实验证明,与现有基准相比,所提出的模型具有优越性。源代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2408.07100
Autonomous robots consistently encounter unforeseen dangerous situations during exploration missions. The characteristic rimless wheels in the AsguardIV rover allow it to overcome challenging terrains. However, steep slopes or difficult maneuvers can cause the rover to tip over and threaten the completion of a mission. This work focuses on identifying early signs or initial stages for potential tip-over events to predict and detect these critical moments before they fully occur, possibly preventing accidents and enhancing the safety and stability of the rover during its exploration mission. Inertial Measurement Units (IMU) readings are used to develop compact, robust, and efficient Autoencoders that combine the power of sequence processing of Long Short-Term Memory Networks (LSTM). By leveraging LSTM-based Autoencoders, this work contributes predictive capabilities for detecting tip-over risks and developing safety measures for more reliable exploration missions.
自主机器人 exploration任务过程中会持续遭遇未曾预料的危险情况。AsguardIV 漫游者配备了无轮缘的凸轮,这使得它能够克服困难的 terrain。然而,陡峭的山坡或困难的操作可能导致机器人倾斜,危及整个任务的完成。本工作关注于潜在倾翻事件的早期迹象或初始阶段,以预测和检测这些关键时刻,从而在事件完全发生之前可能预防事故,并增强其在探索任务中的安全和稳定性。通过使用惯性测量单元(IMU)读数,本工作开发了紧凑、稳健、高效的 Autoencoders,结合了长短期记忆网络(LSTM)的序列处理能力。通过利用基于 LSTM 的 Autoencoders,这项工作为检测倾翻风险和为更可靠的探索任务开发安全措施做出了贡献。
https://arxiv.org/abs/2408.05602
This paper explores an improved Adaboost algorithm based on Long Short-Term Memory Networks (LSTMs), which aims to improve the prediction accuracy of user clicks on web page advertisements. By comparing it with several common machine learning algorithms, the paper analyses the advantages of the new model in ad click prediction. It is shown that the improved algorithm proposed in this paper performs well in user ad click prediction with an accuracy of 92%, which is an improvement of 13.6% compared to the highest of 78.4% among the other three base models. This significant improvement indicates that the algorithm is more capable of capturing user behavioural characteristics and time series patterns. In addition, this paper evaluates the model's performance on other performance metrics, including accuracy, recall, and F1 score. The results show that the improved Adaboost algorithm based on LSTM is significantly ahead of the traditional model in all these metrics, which further validates its effectiveness and superiority. Especially when facing complex and dynamically changing user behaviours, the model is able to better adapt and make accurate predictions. In order to ensure the practicality and reliability of the model, this study also focuses on the accuracy difference between the training set and the test set. After validation, the accuracy of the proposed model on these two datasets only differs by 1.7%, which is a small difference indicating that the model has good generalisation ability and can be effectively applied to real-world scenarios.
本文探讨了一种基于长短期记忆网络(LSTMs)的改进Adaboost算法,旨在提高用户在网页广告上的点击预测准确性。通过与几种常见的机器学习算法进行比较,本文分析了新模型在广告点击预测方面的优势。结果表明,与三种基本模型中的最高准确性相比,本文提出的改进算法在用户广告点击预测方面的准确性为92%,这是一个比其他三种模型高出13.6%的显著改进。这一显著的改进表明,该算法在更准确地捕捉用户行为特征和时间序列模式方面表现更出色。此外,本文还评估了模型的性能,包括准确率、召回率和F1分数。结果显示,基于LSTM的改进Adaboost算法在所有这些指标上都显著领先于传统模型,进一步证实了其有效性和优越性。特别是在面对复杂且动态变化的用户行为时,该模型能够更好地适应并做出准确的预测。为了确保模型的实用性和可靠性,本研究还重点关注了训练集和测试集之间的准确性差异。在验证之后,该模型在这些两个数据集上的准确性只相差1.7%,这是一个小的差异,表明该模型具有良好的泛化能力和实际应用价值。
https://arxiv.org/abs/2408.05245
Before the advent of fault-tolerant quantum computers, variational quantum algorithms (VQAs) play a crucial role in noisy intermediate-scale quantum (NISQ) machines. Conventionally, the optimization of VQAs predominantly relies on manually designed optimizers. However, learning to optimize (L2O) demonstrates impressive performance by training small neural networks to replace handcrafted optimizers. In our work, we propose L2O-$g^{\dagger}$, a $\textit{quantum-aware}$ learned optimizer that leverages the Fubini-Study metric tensor ($g^{\dagger}$) and long short-term memory networks. We theoretically derive the update equation inspired by the lookahead optimizer and incorporate the quantum geometry of the optimization landscape in the learned optimizer to balance fast convergence and generalization. Empirically, we conduct comprehensive experiments across a range of VQA problems. Our results demonstrate that L2O-$g^{\dagger}$ not only outperforms the current SOTA hand-designed optimizer without any hyperparameter tuning but also shows strong out-of-distribution generalization compared to previous L2O optimizers. We achieve this by training L2O-$g^{\dagger}$ on just a single generic PQC instance. Our novel $\textit{quantum-aware}$ learned optimizer, L2O-$g^{\dagger}$, presents an advancement in addressing the challenges of VQAs, making it a valuable tool in the NISQ era.
在容错量子计算机问世之前,变分量子算法(VQAs)在噪声中间规模量子(NISQ)机器中扮演着关键角色。通常,VQA的优化主要依赖于人工设计的优化器。然而,通过训练小神经网络替换手工设计的优化器,学习优化(L2O)展示了令人印象深刻的性能。在我们的工作中,我们提出了L2O-g*,一个量子感知的学习优化器,它依赖于Fubini-Study度量张量(g*)和长短期记忆网络。我们通过理论推导得到了启发式查找优化器(lookahead optimizer)的更新方程,并把优化空间中的量子几何融入了学习优化器,以平衡快速收敛和泛化。在实验中,我们对一系列VQA问题进行了全面的分析。我们的结果表明,L2O-g*不仅超越了没有超参数调整的当前SOTA手工设计的优化器,而且与以前的L2O优化器相比显示出强大的离散泛化能力。我们通过在单个通用量子量子点量子计算机上训练L2O-g*实现了这一点。我们新颖的量子感知学习优化器L2O-g*,在解决VQA的挑战方面取得了进步,使得它成为NISQ时代的有价值工具。
https://arxiv.org/abs/2407.14761
This research introduces a novel anomaly detection method designed to enhance the operational reliability of particle accelerators - complex machines that accelerate elementary particles to high speeds for various scientific applications. Our approach utilizes a Long Short-Term Memory (LSTM) neural network to predict the temperature of key components within the magnet power supplies (PSs) of these accelerators, such as heatsinks, capacitors, and resistors, based on the electrical current flowing through the PS. Anomalies are declared when there is a significant discrepancy between the LSTM-predicted temperatures and actual observations. Leveraging a custom-built test stand, we conducted comprehensive performance comparisons with a less sophisticated method, while also fine-tuning hyperparameters of both methods. This process not only optimized the LSTM model but also unequivocally demonstrated the superior efficacy of this new proposed method. The dedicated test stand also facilitated exploratory work on more advanced strategies for monitoring interior PS temperatures using infrared cameras. A proof-of-concept example is provided.
这项研究介绍了一种新的异常检测方法,旨在增强粒子加速器的操作可靠性 - 这些加速器将基本粒子加速到高速,用于各种科学应用。我们的方法利用了一个Long Short-Term Memory(LSTM)神经网络,根据流经加速器磁电供应(PS)的电流预测磁电供应中关键部件(如散热器、电容器和电阻器)的温度。当LSTM预测的温度与实际观测到的温度之间存在显著差异时,宣布存在异常。利用自行设计的测试台,我们与更简单的方法进行了全面的性能比较,同时对两种方法的超参数进行了微调。这一过程不仅优化了LSTM模型,而且明确地证明了这种新方法的有效性。专门的测试台还促进了关于使用红外相机更高级别的内部PS温度监测策略的探索工作。提供一个概念证明示例。
https://arxiv.org/abs/2405.18321
Accurate demand forecasting is crucial for optimizing supply chain management. Traditional methods often fail to capture complex patterns from seasonal variability and special events. Despite advancements in deep learning, interpretable forecasting models remain a challenge. To address this, we introduce the Multi-Channel Data Fusion Network (MCDFN), a hybrid architecture that integrates Convolutional Neural Networks (CNN), Long Short-Term Memory networks (LSTM), and Gated Recurrent Units (GRU) to enhance predictive performance by extracting spatial and temporal features from time series data. Our rigorous benchmarking demonstrates that MCDFN outperforms seven other deep-learning models, achieving superior metrics: MSE (23.5738%), RMSE (4.8553%), MAE (3.9991%), and MAPE (20.1575%). Additionally, MCDFN's predictions were statistically indistinguishable from actual values, confirmed by a paired t-test with a 5% p-value and a 10-fold cross-validated statistical paired t-test. We apply explainable AI techniques like ShapTime and Permutation Feature Importance to enhance interpretability. This research advances demand forecasting methodologies and offers practical guidelines for integrating MCDFN into supply chain systems, highlighting future research directions for scalability and user-friendly deployment.
准确的需求预测对于优化供应链管理至关重要。传统的预测方法通常无法捕捉季节性变化和特殊事件中的复杂模式。尽管在深度学习方面取得了进步,但可解释的预测模型仍然具有挑战性。为了应对这个问题,我们引入了多通道数据融合网络(MCDFN),一种集成卷积神经网络(CNN)、长短时记忆网络(LSTM)和门控循环单元(GRU)的混合架构,通过从时间序列数据中提取空间和时间特征来提高预测性能。我们对MCDFN的严格基准测试表明,MCDFN在其他七种深度学习模型中表现优异,实现了卓越的指标:MSE(23.5738%)、RMSE(4.8553%)、MAE(3.9991%)和MAPE(20.1575%)。此外,MCDFN的预测在统计上与实际值无显著区别,这通过与具有5% p值的双尾t检验和具有10倍交叉验证的统计t检验得到了证实。我们采用了解释性AI技术,如ShapTime和Permutation Feature Importance,来增强可解释性。这项研究推动了需求预测方法的发展,为将MCDFN集成到供应链系统提供了实际指导,并强调了未来研究的方向,以实现可扩展性和用户友好性。
https://arxiv.org/abs/2405.15598
Recommender Systems (RSs) provide personalized recommendation service based on user interest, which are widely used in various platforms. However, there are lots of users with sparse interest due to lacking consumption behaviors, which leads to poor recommendation results for them. This problem is widespread in large-scale RSs and is particularly difficult to address. To solve this problem, we propose a novel solution named User Interest Enhancement (UIE) which enhances user interest including user profile and user history behavior sequences using the enhancement vectors and personalized enhancement vector generated based on stream clustering and memory networks from different perspectives. UIE not only remarkably improves model performance on the users with sparse interest but also significantly enhance model performance on other users. UIE is an end-to-end solution which is easy to be implemented based on ranking model. Moreover, we expand our solution and apply similar methods to long-tail items, which also achieves excellent improvement. Furthermore, we conduct extensive offline and online experiments in a large-scale industrial RS. The results demonstrate that our model outperforms other models remarkably, especially for the users with sparse interest. Until now, UIE has been fully deployed in multiple large-scale RSs and achieved remarkable improvements.
推荐系统(RSs)根据用户兴趣提供个性化推荐服务,这些服务在各种平台上得到了广泛应用。然而,由于消费行为不活跃,许多用户兴趣较低,导致他们的推荐结果不佳。这个问题在大型RS中非常普遍,而且很难解决。为解决这个问题,我们提出了一个名为用户兴趣增强(UIE)的新解决方案,它通过基于流聚类和内存网络的个性化增强向量来增强用户兴趣,包括用户画像和用户历史行为序列。UIE不仅显著地改善了低兴趣用户的模型性能,而且显著地改善了其他用户的模型性能。UIE是一种端到端解决方案,易于根据排名模型实现。此外,我们还扩大了我们的解决方案,并将类似方法应用于长尾项目,这也取得了很好的效果。最后,我们在一个大型的工业RS中进行了广泛的离线和在线实验。结果表明,我们的模型显著优于其他模型,特别是对于低兴趣的用户。到目前为止,UIE已在多个大型RS中得到了完全部署,并取得了显著的改进。
https://arxiv.org/abs/2405.13238
Scientist learn early on how to cite scientific sources to support their claims. Sometimes, however, scientists have challenges determining where a citation should be situated -- or, even worse, fail to cite a source altogether. Automatically detecting sentences that need a citation (i.e., citation worthiness) could solve both of these issues, leading to more robust and well-constructed scientific arguments. Previous researchers have applied machine learning to this task but have used small datasets and models that do not take advantage of recent algorithmic developments such as attention mechanisms in deep learning. We hypothesize that we can develop significantly accurate deep learning architectures that learn from large supervised datasets constructed from open access publications. In this work, we propose a Bidirectional Long Short-Term Memory (BiLSTM) network with attention mechanism and contextual information to detect sentences that need citations. We also produce a new, large dataset (PMOA-CITE) based on PubMed Open Access Subset, which is orders of magnitude larger than previous datasets. Our experiments show that our architecture achieves state of the art performance on the standard ACL-ARC dataset ($F_{1}=0.507$) and exhibits high performance ($F_{1}=0.856$) on the new PMOA-CITE. Moreover, we show that it can transfer learning across these datasets. We further use interpretable models to illuminate how specific language is used to promote and inhibit citations. We discover that sections and surrounding sentences are crucial for our improved predictions. We further examined purported mispredictions of the model, and uncovered systematic human mistakes in citation behavior and source data. This opens the door for our model to check documents during pre-submission and pre-archival procedures. We make this new dataset, the code, and a web-based tool available to the community.
科学家们早期就开始学习如何引用科学来源来支持他们的论点。然而,有时候,科学家们会面临引用来源的挑战——或者更糟糕的是,会忽略引用来源 altogether。自动检测需要引用(即引用价值)的句子可以解决这两个问题,从而使科学论点更加健壮和结构清晰。之前的研究者已经将机器学习应用于这项任务,但他们使用的数据集和模型没有充分利用深度学习领域近期的发展,如深度学习中的注意力机制。我们假设我们可以开发出准确性显著的深度学习架构,这些架构可以从开放访问期刊的大型监督数据集中学习。在这篇工作中,我们提出了一个双向长短时记忆(BiLSTM)网络,带有注意力和上下文信息,以检测需要引用来源的句子。此外,我们还基于PubMed开放获取子集产生了一个新的大型数据集(PMOA-CITE),这个数据集比以前的数据集要大得多。我们的实验结果表明,我们的架构在标准的ACL-ARC数据集($F_{1}=0.507$)上的表现达到了最先进的水平,同时在新的PMOA-CITE数据集上的表现也非常出色。此外,我们还证明了它在这些数据集之间进行迁移学习的能力。为了阐明特定语言如何用于促进和抑制引用,我们使用可解释的模型来揭示这一点。我们发现,段落和周围的句子对于改善我们的预测至关重要。我们进一步研究了模型的预提交和预存档过程中出现的系统性的错误,这些错误在引用行为和数据源数据中时有发生。这为我们模型在预提交和预存档过程中检查文档打开了大门。我们将这个新的数据集、代码和基于Web的工具公开发布给学术界。
https://arxiv.org/abs/2405.12206
Analog electronic circuits are at the core of an important category of musical devices. The nonlinear features of their electronic components give analog musical devices a distinctive timbre and sound quality, making them highly desirable. Artificial neural networks have rapidly gained popularity for the emulation of analog audio effects circuits, particularly recurrent networks. While neural approaches have been successful in accurately modeling distortion circuits, they require architectural improvements that account for parameter conditioning and low latency response. In this article, we explore the application of recent machine learning advancements for virtual analog modeling. We compare State Space models and Linear Recurrent Units against the more common Long Short Term Memory networks. These have shown promising ability in sequence to sequence modeling tasks, showing a notable improvement in signal history encoding. Our comparative study uses these black box neural modeling techniques with a variety of audio effects. We evaluate the performance and limitations using multiple metrics aiming to assess the models' ability to accurately replicate energy envelopes, frequency contents, and transients in the audio signal. To incorporate control parameters we employ the Feature wise Linear Modulation method. Long Short Term Memory networks exhibit better accuracy in emulating distortions and equalizers, while the State Space model, followed by Long Short Term Memory networks when integrated in an encoder decoder structure, outperforms others in emulating saturation and compression. When considering long time variant characteristics, the State Space model demonstrates the greatest accuracy. The Long Short Term Memory and, in particular, Linear Recurrent Unit networks present more tendency to introduce audio artifacts.
模拟电子电路是音乐设备的一个重要分类的核心。它们电子元件的非线性特性使模拟音乐设备具有独特的音色和音质,因此受到高度欢迎。人工神经网络为了模拟模拟音频效果电路,特别是循环网络,已经取得了迅速的流行。尽管神经网络在准确建模失真电路方面取得了成功,但它们需要考虑参数条件和低延迟响应的架构改进。在本文中,我们探讨了机器学习最新进展在虚拟模拟建模中的应用。我们比较了状态空间模型和线性循环单元与更常见的长短期记忆网络。这些网络在序列建模任务中显示出良好的能力,并在信号历史编码方面取得了显著的改进。我们的比较研究使用这些黑盒神经建模技术对各种音频效果进行评估。为了包括控制参数,我们采用特征级线性调制方法。长短期记忆网络在模拟失真和均衡器方面表现更好,而状态空间模型(包括长短期记忆网络在内集成到编码器-解码器结构中)在模拟过冲和压缩方面优于其他网络。当考虑长时间变化特征时,状态空间模型表现出最大的准确性。长短期记忆和,尤其是线性循环单元网络,更容易引入音频噪声。
https://arxiv.org/abs/2405.04124
Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{this https URL}.
序列建模是一个贯穿各种领域的关键领域,包括自然语言处理(NLP)、语音识别、时间序列预测、音乐生成和生物信息学。递归神经网络(RNNs)和长短时记忆网络(LSTMs)历史上曾统治序列建模任务,如机器翻译、命名实体识别等。然而,Transformer的进步导致了一种范式的转移,由于它们在性能上的优越表现。然而,Transformer的注意力复杂性和处理归纳偏差的能力仍然存在挑战。为解决这些问题,已经提出了几种变体,包括使用特征网络或卷积的模型,并在各种任务上表现良好。然而,它们仍然很难处理长序列。状态空间模型(SSMs)在这一背景下出现了有前景的替代方案,尤其是S4和其变体,如S4nd、Hippo、Hyena、诊断状态空间(DSS)、Gated State Spaces(GSS)和Linear Recurrent Unit(LRU)、Liquid-S4、Mamba等。在本次调查中,我们根据三种范式对基本SSMs进行了分类,即开关架构、结构架构和循环架构。本调查还强调了SSMs在各个领域的多样化应用,如视觉、视频、音频、语音、语言(特别是长序列建模)、医学(包括基因组学)、化学(如药物设计)和推荐系统,以及时间序列分析,包括表格数据。此外,我们还分析了SSMs在基准数据集,如Long Range Arena(LRA)、WikiText、Glue、Pile、ImageNet、Kinetics-400、sstv2,以及视频数据集,如Breakfast、COIN、LVU等。Mamba-360工作的项目页面可以在该网页上查看。
https://arxiv.org/abs/2404.16112
Deep learning models have become a powerful tool in knee angle estimation for lower limb prostheses, owing to their adaptability across various gait phases and locomotion modes. Current methods utilize Multi-Layer Perceptrons (MLP), Long-Short Term Memory Networks (LSTM), and Convolutional Neural Networks (CNN), predominantly analyzing motion information from the thigh. Contrary to these approaches, our study introduces a holistic perspective by integrating whole-body movements as inputs. We propose a transformer-based probabilistic framework, termed the Angle Estimation Probabilistic Model (AEPM), that offers precise angle estimations across extensive scenarios beyond walking. AEPM achieves an overall RMSE of 6.70 degrees, with an RMSE of 3.45 degrees in walking scenarios. Compared to the state of the art, AEPM has improved the prediction accuracy for walking by 11.31%. Our method can achieve seamless adaptation between different locomotion modes. Also, this model can be utilized to analyze the synergy between the knee and other joints. We reveal that the whole body movement has valuable information for knee movement, which can provide insights into designing sensors for prostheses. The code is available at this https URL.
深度学习模型已成为下肢假肢角估计的强大工具,这主要是因为它们在各种步态和运动模式上的适应性。目前的方法主要利用多层感知器(MLP)、长短时记忆网络(LSTM)和卷积神经网络(CNN),主要分析大腿的运动信息。然而,与这些方法不同,我们的研究通过将全身运动作为输入来提出一种全局视角。我们提出了一个基于Transformer的全概率框架,称为角度估计概率模型(AEPM),在广泛的场景中实现了精确的角估计。AEPM在行走场景中的整体RMSE为6.70度,行走场景中的RMSE为3.45度。与最先进的技术相比,AEPM在行走方面的预测准确性提高了11.31%。我们的方法可以在不同运动模式之间实现无缝适应。此外,这个模型还可以用于分析膝盖和其他关节之间的协同作用。我们发现全身运动对膝盖运动具有宝贵的信息,可以为假肢设计传感器提供启示。代码可在此处下载:https://url.cn/xyz5h
https://arxiv.org/abs/2404.06772
With the emergence of pre-trained vision-language models like CLIP, how to adapt them to various downstream classification tasks has garnered significant attention in recent research. The adaptation strategies can be typically categorized into three paradigms: zero-shot adaptation, few-shot adaptation, and the recently-proposed training-free few-shot adaptation. Most existing approaches are tailored for a specific setting and can only cater to one or two of these paradigms. In this paper, we introduce a versatile adaptation approach that can effectively work under all three settings. Specifically, we propose the dual memory networks that comprise dynamic and static memory components. The static memory caches training data knowledge, enabling training-free few-shot adaptation, while the dynamic memory preserves historical test features online during the testing process, allowing for the exploration of additional data insights beyond the training set. This novel capability enhances model performance in the few-shot setting and enables model usability in the absence of training data. The two memory networks employ the same flexible memory interactive strategy, which can operate in a training-free mode and can be further enhanced by incorporating learnable projection layers. Our approach is tested across 11 datasets under the three task settings. Remarkably, in the zero-shot scenario, it outperforms existing methods by over 3\% and even shows superior results against methods utilizing external training data. Additionally, our method exhibits robust performance against natural distribution shifts. Codes are available at \url{this https URL}.
随着预训练视觉语言模型(如CLIP)的出现,如何将它们适应各种下游分类任务的研究引起了人们的关注。适应策略通常可以分为三种范式:零样本适应、少样本适应和最近提出的无样本适应。大多数现有方法都是为特定场景而设计的,只能适应其中的一个或两个范式。在本文中,我们提出了一个通用的适应策略,可以在所有三个设置中有效工作。具体来说,我们提出了包括动态和静态内存组件的双内存网络。静态内存缓存训练数据知识,实现无样本少 shot 适应,而动态内存在测试过程中保留历史测试特征,允许探索训练集之外的数据洞察。这种新的能力在少样本设置中提高了模型性能,并在没有训练数据的情况下使模型具有可用性。两个内存网络采用相同的灵活内存交互策略,可以以训练-free模式运行,并通过引入可学习投影层进一步增强。我们的方法在三个任务设置下的11个数据集上进行了测试。值得注意的是,在零样本场景中,它超过了现有方法约3%的性能,甚至对抗使用外部训练数据的算法具有优越性。此外,我们的方法对自然分布变化具有鲁棒性能。代码可在此处访问:https://this URL。
https://arxiv.org/abs/2403.17589
Identifying key nodes in social networks plays a crucial role in timely blocking false information. Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios. In this paper, we propose a novel Multi-perspective Memory Enhanced Network (MMEN) for identifying key nodes in social networks, which mines key nodes from multiple perspectives and utilizes memory networks to store historical information. Specifically, MMEN first constructs two propagation networks from the perspectives of user attributes and propagation structure and updates node feature representations using graph attention networks. Meanwhile, the memory network is employed to store information of similar subgraphs, enhancing the model's generalization performance in unknown scenarios. Finally, MMEN applies adaptive weights to combine the node influence of the two propagation networks to select the ultimate key nodes. Extensive experiments demonstrate that our method significantly outperforms previous methods.
在社交网络中识别关键节点对于及时屏蔽虚假信息具有关键作用。现有的关键节点识别方法通常仅从传播结构角度考虑节点影响力,并且对未知场景的泛化能力不足。在本文中,我们提出了一个名为多视角记忆增强网络(MMEN)的新方法来识别社交网络中的关键节点,该方法从多个角度挖掘关键节点,并使用记忆网络来存储历史信息。具体来说,MMEN首先从用户属性和传播结构的角度构建两个传播网络,并使用图注意力网络更新节点特征表示。同时,记忆网络用于存储类似子图的信息,提高了模型在未知场景下的泛化性能。最后,MMEN应用自适应权重将两个传播网络节点的影响力结合起来,选择最终的关键节点。大量实验证明,我们的方法显著优于现有方法。
https://arxiv.org/abs/2403.15235
Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-trained DETR-style detector to new tasks while preserving knowledge from previous tasks. We propose a novel localized query function for efficient information retrieval from memory units, aiming to minimize forgetting. Furthermore, we identify a fundamental challenge in continual detection referred to as background relegation. This arises when object categories from earlier tasks reappear in future tasks, potentially without labels, leading them to be implicitly treated as background. This is an inevitable issue in continual detection or segmentation. The introduced continual optimization technique effectively tackles this challenge. Finally, we assess the performance of our proposed system on continual detection benchmarks and demonstrate that our approach surpasses the performance of existing state-of-the-art resulting in 5-7% improvements on MS-COCO and PASCAL-VOC on the task of continual detection.
现代预训练架构在持续对新技术进行微调时,很难保留之前的知识。尽管在持续分类方面取得了显著的进展,但为复杂视觉任务(如检测或分割)设计的系统仍然很难达到令人满意的成绩。在本文中,我们引入了一种基于记忆的检测Transformer架构,将预训练的DETR风格检测器适应于新技术,同时保留之前任务的知識。我们提出了一种新的局部查询函数,用于从记忆单元中进行高效的信息检索,旨在最小化遗忘。此外,我们还指出了连续检测中一个基本挑战,称为背景降格。当早期任务中的物体类别在后续任务中重新出现,可能没有标签时,会导致它们被隐含地视为背景。这是连续检测或分割中不可避免的 issue。我们提出的连续优化技术有效地解决了这个挑战。最后,我们在连续检测基准上评估我们所提出的系统的性能,并证明了我们的方法超越了現有狀態-of-the-art,实现了在连续检测上的5-7%改进。
https://arxiv.org/abs/2403.14797
Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes. The conventional approaches that tackle this problem typically employ an unsupervised learning framework, capturing normality patterns with exclusive normal data during training and identifying deviations as anomalies during testing. However, these methods face critical drawbacks: they either only depend on proxy tasks for general representation without directly pinpointing normal patterns, or they neglect to differentiate between spatial and temporal normality patterns, leading to diminished efficacy in anomaly detection. To address these challenges, we introduce a novel Spatial-Temporal memories-enhanced graph autoencoder (STRIPE). Initially, STRIPE employs Graph Neural Networks (GNNs) and gated temporal convolution layers to extract spatial features and temporal features, respectively. Then STRIPE incorporates separate spatial and temporal memory networks, which capture and store prototypes of normal patterns, thereby preserving the uniqueness of spatial and temporal normality. After that, through a mutual attention mechanism, these stored patterns are then retrieved and integrated with encoded graph embeddings. Finally, the integrated features are fed into the decoder to reconstruct the graph streams which serve as the proxy task for anomaly detection. This comprehensive approach not only minimizes reconstruction errors but also refines the model by emphasizing the compactness and distinctiveness of the embeddings in relation to the nearest memory prototypes. Through extensive testing, STRIPE has demonstrated a superior capability to discern anomalies by effectively leveraging the distinct spatial and temporal dynamics of dynamic graphs, significantly outperforming existing methodologies, with an average improvement of 15.39% on AUC values.
在动态图中的异常检测是一个挑战性的任务,因为图结构和属性的时间演化。解决这个问题的传统方法通常采用无监督学习框架,在训练期间捕获规范模式,并在测试期间识别异常。然而,这些方法面临着关键的缺陷:它们要么只依赖于一般表示的代理任务,没有直接确定规范模式,要么忽视了空间和时间规范模式之间的区别,导致异常检测的有效性降低。为了应对这些挑战,我们引入了一种新颖的空间-时间记忆增强图自编码器(STRIPE)。 首先,STRIPE采用图神经网络(GNNs)和有门时间卷积层来提取空间特征和时间特征。然后,STRIPE引入了单独的空间和时间记忆网络,它们捕获并存储规范模式的模板,从而保留空间和时间的独特性。接下来,通过自注意力机制,这些存储的模式被检索并整合与编码的图嵌入。最后,将整合的嵌入输入解码器以重构图流作为异常检测的代理任务。 这种全面的方法不仅减少了重构误差,而且通过强调嵌入与最近记忆原型之间的简洁性和差异性,优化了模型。通过广泛的测试,STRIPE已经证明了自己在区分异常方面的优越性能,有效提高了平均异常检测的准确率15.39%。
https://arxiv.org/abs/2403.09039