Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
边缘计算是一个有前途的解决方案,用于处理来自传感器和物联网设备的高度多维模拟数据,例如自主无人机的应用。然而,边缘设备有限的存储和计算资源使得在边缘进行复杂的预测建模具有挑战性。计算在内存(CiM)已经成为最小化基于深度学习的推理所需的能量的主要范式,CiM方法使用二进制瓦氏哈姆变换,减少了深度学习模型所需的参数(在 MobileNetV2中减少87%),并允许计算在内存中执行,这更好地利用了并行性在推理期间。第二,描述了在 CiM数组中的内存沉浸式协作数字重构方法,以减少传统ADC的面积 overhead。这使得在相对较小 footprint 的设计中更多 CiM数组实现,导致更好的并行性和减少外部内存访问。不同网络配置被探索,其中 Flash、SA 和它们的混合数字重构步骤可以使用内存沉浸式方案实现。结果使用65纳米 CMOS测试芯片演示了,与40纳米节点5位SAR ADC 和5位 Flash ADC相比,表现出显著的面积和能源节省。通过更高效地处理模拟数据,可以选择保留传感器中的有价值的数据,减轻模拟数据倾泻带来的挑战。
https://arxiv.org/abs/2309.11048
Electricity demand forecasting is a well established research field. Usually this task is performed considering historical loads, weather forecasts, calendar information and known major events. Recently attention has been given on the possible use of new sources of information from textual news in order to improve the performance of these predictions. This paper proposes a Long and Short-Term Memory (LSTM) network incorporating textual news features that successfully predicts the deterministic and probabilistic tasks of the UK national electricity demand. The study finds that public sentiment and word vector representations related to transport and geopolitics have time-continuity effects on electricity demand. The experimental results show that the LSTM with textual features improves by more than 3% compared to the pure LSTM benchmark and by close to 10% over the official benchmark. Furthermore, the proposed model effectively reduces forecasting uncertainty by narrowing the confidence interval and bringing the forecast distribution closer to the truth.
电力需求预测是一个已经建立的研究领域。通常,这项任务需要考虑历史负荷、天气预报、日历信息和已知的主要事件等因素。最近,人们开始关注从文本新闻中可能使用的新的信息来源,以改善这些预测的性能。本文提出了一个包含文本新闻特征的长短时记忆网络(LSTM),成功预测了英国的全国电力需求确定性和概率任务。研究发现,与交通和地缘政治相关的公众情绪和词向量表示对电力需求具有时间连续性的影响。实验结果显示,与纯LSTM基准相比,具有文本特征的LSTM提高了超过3%,与官方基准相比提高了接近10%。此外, proposed 模型通过缩小 confidence interval 和将预测分布更接近真相,有效地减少了预测不确定性。
https://arxiv.org/abs/2309.06793
Introduction: Electroencephalogram (EEG) signals have gained significant popularity in various applications due to their rich information content. However, these signals are prone to contamination from various sources of artifacts, notably the electrooculogram (EOG) artifacts caused by eye movements. The most effective approach to mitigate EOG artifacts involves recording EOG signals simultaneously with EEG and employing blind source separation techniques, such as independent component analysis (ICA). Nevertheless, the availability of EOG recordings is not always feasible, particularly in pre-recorded datasets. Objective: In this paper, we present a novel methodology that combines a long short-term memory (LSTM)-based neural network with ICA to address the challenge of EOG artifact removal from contaminated EEG signals. Approach: Our approach aims to accomplish two primary objectives: 1) estimate the horizontal and vertical EOG signals from the contaminated EEG data, and 2) employ ICA to eliminate the estimated EOG signals from the EEG, thereby producing an artifact-free EEG signal. Main results: To evaluate the performance of our proposed method, we conducted experiments on a publicly available dataset comprising recordings from 27 participants. We employed well-established metrics such as mean squared error, mean absolute error, and mean error to assess the quality of our artifact removal technique. Significance: Furthermore, we compared the performance of our approach with two state-of-the-art deep learning-based methods reported in the literature, demonstrating the superior performance of our proposed methodology.
介绍:EEG信号因其丰富的信息内容而在各种应用中获得了广泛应用。然而,这些信号容易受到各种干扰项的影响,特别是由于眼睛运动引起的眼动电信号干扰项。为了减轻眼动电信号干扰项的影响,我们提出了一种新方法,它同时记录EEG信号和眼动电信号,并使用独立成分分析(ICA)等盲源分离技术。方法:我们的目标是实现两个主要目标:1)从污染的EEG数据中估计水平和垂直的眼动电信号;2)使用ICA从EEG中删除估计的眼动电信号,从而生成无干扰的EEG信号。结果:为了评估我们提出的新方法的性能,我们在一个包含27个参与者记录的公开数据集上进行了实验。我们使用了 established metrics,如平方误差、绝对误差和平均误差,来评估我们的干扰去除技术的质量。意义:此外,我们比较了我们的新方法的性能与文献中报道的两种先进的深度学习方法,证明了我们提出的新方法的性能优势。
https://arxiv.org/abs/2308.13371
Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: this https URL.
图像标题与许多涉及视觉和语言的任务的Task一样,目前依赖于基于Transformer的架构提取图像语义并将它翻译成语言 coherent 的描述。尽管取得了成功,但注意力操作只考虑当前输入样本 projections 的加权总和,因此忽略了可以从其他样本 joint 观察的相关语义信息。在本文中,我们设计了一种网络,可以在处理其他训练样本时对获取的激活进行注意力处理,通过创建一个原型记忆模型。我们的记忆模型通过定义具有区分性和紧凑性的原型向量来定义过去的键和值分布。实验中,我们比较了精心设计的基准线和当前方法的性能,并研究了每个 proposed 组件的作用。我们证明了,我们的提议可以在仅使用交叉熵训练的情况下,或在使用自我关键序列训练的情况下,提高编码器和解码器 Transformer 的性能,从而提高了3.7 CIDEr 点。源代码和训练模型可在以下httpsURL上获取。
https://arxiv.org/abs/2308.12383
Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
在重症监护室(ICU)中进行广泛的床边监测,产生了关于病人生理学的复杂的时序数据,为临床数据分析提供了一个向上扩展的背景。另一方面,在这些数据中发现时序模式可能提供高预测能力,从而有助于预测临床事件。因此,在这次工作中,我们研究了一种自动数据驱动系统,该系统从电子健康记录(EHR)中分析大量的多变量时序数据,并提取高级信息,以预测在医院内的死亡率和住院天数(LOS)。实际上,我们将该网络的适用性降低到6小时,以增强临床任务。实验结果显示,LSTM模型用严格的多变量时序测量方法构建的实际预测引擎的效率。
https://arxiv.org/abs/2308.12800
We extend recurrent neural networks to include several flexible timescales for each dimension of their output, which mechanically improves their abilities to account for processes with long memory or with highly disparate time scales. We compare the ability of vanilla and extended long short term memory networks (LSTMs) to predict asset price volatility, known to have a long memory. Generally, the number of epochs needed to train extended LSTMs is divided by two, while the variation of validation and test losses among models with the same hyperparameters is much smaller. We also show that the model with the smallest validation loss systemically outperforms rough volatility predictions by about 20% when trained and tested on a dataset with multiple time series.
我们扩展了循环神经网络的输出维度,为每个维度添加了多个灵活的时间段,机械地提高了它们处理具有长期记忆或时间尺度差异较大的过程的能力。我们比较了普通LSTM和扩展LSTM的预测能力,普通LSTM已知具有长期记忆。通常情况下,训练扩展LSTM所需的 epoch 数量被除以二,而与相同的超参数配置的其他模型之间的验证和测试损失变化较小。我们还表明,模型的验证损失最小,在训练和测试多个时间序列数据的dataset时,比粗略的 Volatility 预测高出约20%。
https://arxiv.org/abs/2308.08550
Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the lack of availability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), require the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of transformers and long short-term memory networks, which constitute model-free RL solutions, with a model-based/model-free hybrid approach. We apply these methods to the real-world problem of optimal maintenance planning for railway assets.
partiallyObservable Markov Decision Processes (POMDPs)可以在随机和不确定的环境中模拟复杂的Sequential决策问题。一个主要障碍是在现实世界中广泛采用POMDP模型或其模拟器的原因是缺乏适当的POMDP模型或其模拟器。可用的解决方案算法,如强化学习(RL),需要了解转移动态和观察生成过程的知识,这些往往 unknown 且难以推断。在这项工作中,我们提出了一种综合框架,通过深度强化学习来推断和 robust 解决方案 POMDPs。首先,通过 hidden Markov模型的马尔可夫链蒙特卡罗采样,联合推断所有转移和观察模型参数,以从可用数据中恢复完整的后验分布。对于具有不确定参数的POMDP,我们使用深度强化学习技术,通过领域随机化将参数分布集成到解决方案中,以开发 robust 的解决方案,以应对模型不确定性。作为进一步的贡献,我们比较了使用Transformers和长期短期记忆网络,组成了模型无关的强化学习解决方案,与基于模型/模型无关的混合方法。我们将这些方法应用于铁路资产最优维护计划的现实世界问题。
https://arxiv.org/abs/2307.08082
Open-set image recognition is a challenging topic in computer vision. Most of the existing works in literature focus on learning more discriminative features from the input images, however, they are usually insensitive to the high- or low-frequency components in features, resulting in a decreasing performance on fine-grained image recognition. To address this problem, we propose a Complementary Frequency-varying Awareness Network that could better capture both high-frequency and low-frequency information, called CFAN. The proposed CFAN consists of three sequential modules: (i) a feature extraction module is introduced for learning preliminary features from the input images; (ii) a frequency-varying filtering module is designed to separate out both high- and low-frequency components from the preliminary features in the frequency domain via a frequency-adjustable filter; (iii) a complementary temporal aggregation module is designed for aggregating the high- and low-frequency components via two Long Short-Term Memory networks into discriminative features. Based on CFAN, we further propose an open-set fine-grained image recognition method, called CFAN-OSFGR, which learns image features via CFAN and classifies them via a linear classifier. Experimental results on 3 fine-grained datasets and 2 coarse-grained datasets demonstrate that CFAN-OSFGR performs significantly better than 9 state-of-the-art methods in most cases.
开放集图像识别是计算机视觉中的挑战性话题。现有的文献大多数集中在从输入图像中学习更敏感的特征,然而,它们通常对特征中的高频或低频成分不敏感,导致在精细图像识别方面的性能下降。为了解决这一问题,我们提出了一种互补的频率可变感知网络,称为CFAN,它能够更好地捕捉高频和低频信息。CFAN由三个Sequential模块组成:(i)引入特征提取模块,以从输入图像中学习初步特征;(ii)设计一个频率可变滤波器模块,通过一个可调整滤波器从频率域中分离出高和低频成分,并将它们与初步特征一起聚合成精细特征;(iii)设计一个互补的时间聚合模块,以通过两个长期短期记忆网络将高和低频成分聚合成精细特征。基于CFAN,我们提出了一种开放集精细图像识别方法,称为CFAN-OSFGR,它通过CFAN学习图像特征,并通过线性分类器进行分类。对三个精细数据集和两个粗粒度数据集的实验结果表明,CFAN-OSFGR在大多数情况下比9个最先进的方法表现得更好。
https://arxiv.org/abs/2307.07214
Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.
关系提取(RE)的任务是从文本中提取实体之间的关系。大多数关系提取方法从自由形式的连续文本中提取关系,而忽略其他丰富的数据源,如表格。我们从表格组织数据的角度探讨RE。我们介绍了一个由卷积神经网络(CNN)和双向长短时记忆网络(BiLSTM)组成的新模型,分别编码实体并学习它们之间的依赖关系。我们在一个大型最近的数据集上评估了我们的模型,并与以前的神经网络方法进行比较。实验结果表明,我们的模型在表格数据的关系提取任务中 consistently outperforms the previous model。我们进行了全面的错误分析和微分研究,以显示模型的各种组件的贡献。最后,我们讨论了我们方法的有用性和权衡,并提供了促进进一步研究的建议。
https://arxiv.org/abs/2307.05827
Speech emotion recognition is a challenging task in speech processing field. For this reason, feature extraction process has a crucial importance to demonstrate and process the speech signals. In this work, we represent a model, which feeds raw audio files directly into the deep neural networks without any feature extraction stage for the recognition of emotions utilizing six different data sets, EMO-DB, RAVDESS, TESS, CREMA, SAVEE, and TESS+RAVDESS. To demonstrate the contribution of proposed model, the performance of traditional feature extraction techniques namely, mel-scale spectogram, mel-frequency cepstral coefficients, are blended with machine learning algorithms, ensemble learning methods, deep and hybrid deep learning techniques. Support vector machine, decision tree, naive Bayes, random forests models are evaluated as machine learning algorithms while majority voting and stacking methods are assessed as ensemble learning techniques. Moreover, convolutional neural networks, long short-term memory networks, and hybrid CNN- LSTM model are evaluated as deep learning techniques and compared with machine learning and ensemble learning methods. To demonstrate the effectiveness of proposed model, the comparison with state-of-the-art studies are carried out. Based on the experiment results, CNN model excels existent approaches with 95.86% of accuracy for TESS+RAVDESS data set using raw audio files, thence determining the new state-of-the-art. The proposed model performs 90.34% of accuracy for EMO-DB with CNN model, 90.42% of accuracy for RAVDESS with CNN model, 99.48% of accuracy for TESS with LSTM model, 69.72% of accuracy for CREMA with CNN model, 85.76% of accuracy for SAVEE with CNN model in speaker-independent audio categorization problems.
语音识别在语音处理领域是一项具有挑战性的任务。因此,特征提取过程对于展示和处理语音信号至关重要。在本研究中,我们代表了一个模型,该模型将 raw audio files直接输入深度神经网络,而不需要特征提取阶段,以识别情感利用六个不同的数据集,EMO-DB、RAVDESS、TESS、CREMA、SaveE和TESS+RAVDESS。为了证明该模型的贡献,传统的特征提取技术的性能,例如 Mel 尺度spectogram、Mel 频率cepstral coefficients 被与机器学习算法、群体学习方法、深度和混合深度学习技术Blended 一起评估。支持向量机、决策树、Naive Bayes、随机森林模型被评为机器学习算法,群体投票和堆叠方法被评为群体学习方法。此外,卷积神经网络、长期短期记忆网络和混合 CNN-LSTM 模型被评为深度学习技术,并与机器学习和群体学习方法进行比较。为了证明该模型的有效性,进行了与现有方法的比较。根据实验结果,CNN 模型在 TESS+RAVDESS 数据集上表现优异,使用 raw audio files 时准确率为 95.86%,因此确定了新的先进技术。该模型在 EMO-DB 与 CNN 模型使用时准确率为 90.34%,在 RAVDESS 与 CNN 模型使用时准确率为 90.42%,在 TESS 与 LSTM 模型使用时准确率为 99.48%,在 CREMA 与 CNN 模型使用时准确率为 69.72%,在 SaveE 与 CNN 模型使用时准确率为 85.76% 解决 speaker-independent 音频分类问题。
https://arxiv.org/abs/2307.02820
Graph-level anomaly detection aims to identify abnormal graphs that exhibit deviant structures and node attributes compared to the majority in a graph set. One primary challenge is to learn normal patterns manifested in both fine-grained and holistic views of graphs for identifying graphs that are abnormal in part or in whole. To tackle this challenge, we propose a novel approach called Hierarchical Memory Networks (HimNet), which learns hierarchical memory modules -- node and graph memory modules -- via a graph autoencoder network architecture. The node-level memory module is trained to model fine-grained, internal graph interactions among nodes for detecting locally abnormal graphs, while the graph-level memory module is dedicated to the learning of holistic normal patterns for detecting globally abnormal graphs. The two modules are jointly optimized to detect both locally- and globally-anomalous graphs. Extensive empirical results on 16 real-world graph datasets from various domains show that i) HimNet significantly outperforms the state-of-art methods and ii) it is robust to anomaly contamination. Codes are available at: this https URL.
Graph-level异常检测旨在识别与 graph 集合中大多数人所展现的结构异常和节点属性不同异常情况 graphs。一个主要挑战是学习正常模式在 graph 的精细和整体视图中的表现,以识别部分或整体异常的 graph。为了解决这个问题,我们提出了一种新的方法来称为Hierarchical Memory Networks(HimNet),该方法通过 graph 自动编码网络架构学习层级内存模块 - 节点和 graph 内存模块。节点级别的内存模块通过训练模型来建模节点之间的精细内部 graph 交互,以检测局部异常 graph。而 graph 级别的内存模块则专注于学习整体正常模式以检测全局异常 graph。两个模块同时优化以检测局部和全局异常的 graph。对 16 个不同领域的真实世界 graph 数据集进行广泛的实验结果表明,i) HimNet 显著优于当前最先进的方法,而 ii)它对于异常污染具有鲁棒性。代码可在 this https URL 找到。
https://arxiv.org/abs/2307.00755
Anticipating audience reaction towards a certain text is integral to several facets of society ranging from politics, research, and commercial industries. Sentiment analysis (SA) is a useful natural language processing (NLP) technique that utilizes lexical/statistical and deep learning methods to determine whether different-sized texts exhibit positive, negative, or neutral emotions. Recurrent networks are widely used in machine-learning communities for problems with sequential data. However, a drawback of models based on Long-Short Term Memory networks and Gated Recurrent Units is the significantly high number of parameters, and thus, such models are computationally expensive. This drawback is even more significant when the available data are limited. Also, such models require significant over-parameterization and regularization to achieve optimal performance. Tensorized models represent a potential solution. In this paper, we classify the sentiment of some social media posts. We compare traditional recurrent models with their tensorized version, and we show that with the tensorized models, we reach comparable performances with respect to the traditional models while using fewer resources for the training.
预测读者对某篇文章的反应是社会多个方面的重要组成部分,包括政治、研究以及商业行业。Sentiment analysis (SA) 是一种有用的自然语言处理技术,利用词汇/统计和深度学习方法来确定不同大小的文章是否表现出积极、消极或中性的情感。循环神经网络在机器学习社区中被广泛用于处理序列数据的问题。然而,基于长短期记忆网络和门控循环单元的模型的一个缺点是参数数量非常大,因此,这种模型的计算成本很高。当可用数据有限时,这个缺点变得更加显著。此外,这种模型需要进行大量的超参数化和规范化才能达到最佳性能。Tensor化模型代表了一种潜在解决方案。在本文中,我们对某些社交媒体帖子的情感分类进行 classification。我们比较了传统的循环模型及其Tensor化版本,并表明,使用Tensor化模型,我们可以与传统的模型达到类似的性能,同时使用更少的训练资源。
https://arxiv.org/abs/2306.09705
The RoboCup competitions hold various leagues, and the Soccer Simulation 2D League is a major one among them. Soccer Simulation 2D (SS2D) match involves two teams, including 11 players and a coach, competing against each other. The players can only communicate with the Soccer Simulation Server during the game. This paper presents the latest research of the CYRUS soccer simulation 2D team, the champion of RoboCup 2021. We will explain our denoising idea powered by long short-term memory networks (LSTM) and deep neural networks (DNN). The CYRUS team uses the CYRUS2D base code that was developed based on the Helios and Gliders bases.
RoboCup竞赛包括各种联赛,而足球模拟2D联赛是其中一个重要的联赛。足球模拟2D(SS2D)比赛涉及两个队伍,包括11名球员和一名教练,互相竞争。在比赛中,球员只能与足球模拟服务器进行通信。本文介绍了CYRUS足球模拟2D团队,2021 RoboCup冠军团队的最新研究。我们将解释我们使用长短期记忆网络(LSTM)和深度神经网络(DNN)进行去噪的想法。CYRUS团队使用基于Helios和 Glider基础的CYRUS2D基代码。
https://arxiv.org/abs/2305.19283
Continual learning on sequential data is critical for many machine learning (ML) deployments. Unfortunately, LSTM networks, which are commonly used to learn on sequential data, suffer from catastrophic forgetting and are limited in their ability to learn multiple tasks continually. We discover that catastrophic forgetting in LSTM networks can be overcome in two novel and readily-implementable ways -- separating the LSTM memory either for each task or for each target label. Our approach eschews the need for explicit regularization, hypernetworks, and other complex methods. We quantify the benefits of our approach on recently-proposed LSTM networks for computer memory access prefetching, an important sequential learning problem in ML-based computer system optimization. Compared to state-of-the-art weight regularization methods to mitigate catastrophic forgetting, our approach is simple, effective, and enables faster learning. We also show that our proposal enables the use of small, non-regularized LSTM networks for complex natural language processing in the offline learning scenario, which was previously considered difficult.
对Sequential数据进行持续学习对于许多机器学习(ML)部署是至关重要的。不幸的是,LSTM网络,通常用于对Sequential数据进行学习,存在灾难性遗忘,并且其持续学习能力受到限制。我们发现,LSTM网络的灾难性遗忘可以通过两个新颖且易于实现的方法来解决——分别对每个任务或每个目标标签的LSTM记忆进行分离。我们的方法和避免使用显式正则化、超网络和其他复杂的方法。我们量化了我们对最近提出的LSTM网络用于计算机内存访问预加载的研究所带来的好处,这是一个在基于机器学习的计算机系统优化中非常重要的Sequential学习问题。与旨在缓解灾难性遗忘的先进的权重正则化方法相比,我们的方法和简单、有效,并且能够加速学习。我们还展示了,我们的建议使可以使用小型未正则化的LSTM网络在离线学习场景中进行复杂的自然语言处理,这在以前被认为是困难的。
https://arxiv.org/abs/2305.17244
Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the sequence is one of the key aspects in learning the language. However, memory networks are not capable of holding infinitely long sequences in their memories and are limited by various constraints such as the vanishing or exploding gradient problem. Therefore, natural language understanding models are affected when presented with long sequential text. We introduce Long Term Memory network (LTM) to learn from infinitely long sequences. LTM gives priority to the current inputs to allow it to have a high impact. Language modeling is an important factor in natural language understanding. LTM was tested in language modeling, which requires long term memory. LTM is tested on Penn Tree bank dataset, Google Billion Word dataset and WikiText-2 dataset. We compare LTM with other language models which require long term memory.
深度学习和记忆网络的发展在自然语言理解方面取得了重大进展。语言是序列化的,通过序列传递的信息可以被记忆网络捕获。学习序列是学习语言的关键方面之一。然而,记忆网络无法存储无限长序列,并受到各种限制,如消失或爆炸梯度问题。因此,当面对长序列文本时,自然语言理解模型会受到影响。我们引入了长期记忆网络(LTM)来从无限长序列中学习。LTM将当前输入优先级最高,以便能够产生高影响。语言建模是自然语言理解的一个重要因素。LTM在语言建模中进行了测试,需要长期记忆。LTM在宾夕法尼亚树数据库数据集、谷歌亿个单词数据集和维基百科文本-2数据集上进行了测试。我们将LTM与其他需要长期记忆的语言模型进行比较。
https://arxiv.org/abs/2305.11462
Existing question answering methods often assume that the input content (e.g., documents or videos) is always accessible to solve the task. Alternatively, memory networks were introduced to mimic the human process of incremental comprehension and compression of the information in a fixed-capacity memory. However, these models only learn how to maintain memory by backpropagating errors in the answers through the entire network. Instead, it has been suggested that humans have effective mechanisms to boost their memorization capacities, such as rehearsal and anticipation. Drawing inspiration from these, we propose a memory model that performs rehearsal and anticipation while processing inputs to memorize important information for solving question answering tasks from streaming data. The proposed mechanisms are applied self-supervised during training through masked modeling tasks focused on coreference information. We validate our model on a short-sequence (bAbI) dataset as well as large-sequence textual (NarrativeQA) and video (ActivityNet-QA) question answering datasets, where it achieves substantial improvements over previous memory network approaches. Furthermore, our ablation study confirms the proposed mechanisms' importance for memory models.
现有的回答方法通常假设输入内容(例如文档或视频)总是可用来解决问题。Alternatively,引入记忆网络是为了模拟人类在固定容量内存中逐渐增加理解和压缩信息的过程。然而,这些模型只会通过学习整个网络中答案的错误反向传播来维护记忆。相反,有人建议人类有有效的机制来增强记忆能力,例如复习和预测。从这些中汲取灵感,我们提出了一个记忆模型,在处理输入时进行复习和预测,以从流数据中解决问答任务并记忆重要的信息。在训练过程中,我们使用基于掩码关联信息的任务进行自监督训练。我们验证了我们模型在短序列(bAbI)数据集和大型文本(NarrativeQA)和视频(ActivityNet-QA)问答数据集上的性能,比先前的记忆网络方法取得了显著的改进。此外,我们的去除了研究确认了所提出的机制对于记忆模型的重要性。
https://arxiv.org/abs/2305.07565
We address an important yet challenging problem - modeling high-dimensional dependencies across multivariates such as financial indicators in heterogeneous markets. In reality, a market couples and influences others over time, and the financial variables of a market are also coupled. We make the first attempt to integrate variational sequential neural learning with copula-based dependence modeling to characterize both temporal observable and latent variable-based dependence degrees and structures across non-normal multivariates. Our variational neural network WPVC-VLSTM models variational sequential dependence degrees and structures across multivariate time series by variational long short-term memory networks and regular vine copula. The regular vine copula models nonnormal and long-range distributional couplings across multiple dynamic variables. WPVC-VLSTM is verified in terms of both technical significance and portfolio forecasting performance. It outperforms benchmarks including linear models, stochastic volatility models, deep neural networks, and variational recurrent networks in cross-market portfolio forecasting.
我们解决一个重要的但具有挑战性的问题 - 在多变量市场中建模高维度依赖关系,例如在异质市场中的金融指标。实际上,市场随着时间的推移相互结合并对其他事物产生影响,市场的财务变量也相互耦合。我们尝试将 variationalSequential neural learning 与Copula-based依赖建模相结合,以描述非正态多变量中的时间可观测性和隐态变量依赖程度和结构。我们的 variational neural network WPVC-VLSTM 通过 variational long short-term memory networks 和 regular vinecopula 建模多变量时间序列中的 variationalSequential依赖度和结构。 Regular vinecopula 模型多个动态变量之间的非正态和长距离分布耦合。WPVC-VLSTM 在技术和预测性能方面都进行了验证。它在跨市场投资组合预测中比基准包括线性模型、随机波动模型、深度学习网络和 variational循环神经网络表现更好。
https://arxiv.org/abs/2305.08778
Car accidents remain a significant public safety issue worldwide, with the majority of them attributed to driver errors stemming from inadequate driving knowledge, non-compliance with regulations, and poor driving habits. To improve road safety, Driving Behavior Detection (DBD) systems have been proposed in several studies to identify safe and unsafe driving behavior. Many of these studies have utilized sensor data obtained from the Controller Area Network (CAN) bus to construct their models. However, the use of publicly available sensors is known to reduce the accuracy of detection models, while incorporating vendor-specific sensors into the dataset increases accuracy. To address the limitations of existing approaches, we present a reliable DBD system based on Graph Convolutional Long Short-Term Memory Networks (GConvLSTM) that enhances the precision and practicality of DBD models using public sensors. Additionally, we incorporate non-public sensors to evaluate the model's effectiveness. Our proposed model achieved a high accuracy of 97.5\% for public sensors and an average accuracy of 98.1\% for non-public sensors, indicating its consistency and accuracy in both settings. To enable local driver behavior analysis, we deployed our DBD system on a Raspberry Pi at the network edge, with drivers able to access daily driving condition reports, sensor data, and prediction results through a monitoring dashboard. Furthermore, the dashboard issues voice warnings to alert drivers of hazardous driving conditions. Our findings demonstrate that the proposed system can effectively detect hazardous and unsafe driving behavior, with potential applications in improving road safety and reducing the number of accidents caused by driver errors.
汽车事故仍然是全球范围内一个显著的公共安全问题,其中大多数事故都归因于驾驶员的错误,这些错误可能是由于缺乏驾驶知识、不遵守规定和不良驾驶习惯引起的。为了改善道路安全,多个研究提出了驾驶行为检测(DBD)系统,以识别安全或不安全的驾驶行为。这些研究利用了控制器区域网络(CAN)总线得到的传感器数据构建其模型。然而,公开可用的传感器的使用已知会降低检测模型的准确性,而将 vendor-特定的传感器加入数据集可以提高准确性。为了解决现有方法的局限性,我们提出了基于图卷积长短时记忆网络(GConvLSTM)的可靠 DBD 系统,以提高使用公开传感器的 DBD 模型的精度和实用性。此外,我们还引入了非公开传感器以评估模型的有效性。我们提出的模型对于公开传感器 achieving 97.5% 的高准确性,对于非公开传感器平均达到 98.1% 的准确度,表明其在两个设置上的一致性和准确性。为了实现本地驾驶员行为分析,我们在网络边缘部署了我们的 DBD 系统,通过监控面板让驾驶员可以通过每日驾驶条件报告、传感器数据和预测结果访问。此外,监控面板还会发出语音警告,提醒驾驶员危险的驾驶条件。我们的发现表明,该提出的系统可以有效地检测危险的和不安全的驾驶行为,其潜在应用包括改善道路安全并减少由驾驶员错误引起的事故。
https://arxiv.org/abs/2305.05670
In this paper, we propose using deep neural networks to extract important information from Vietnamese legal questions, a fundamental task towards building a question answering system in the legal domain. Given a legal question in natural language, the goal is to extract all the segments that contain the needed information to answer the question. We introduce a deep model that solves the task in three stages. First, our model leverages recent advanced autoencoding language models to produce contextual word embeddings, which are then combined with character-level and POS-tag information to form word representations. Next, bidirectional long short-term memory networks are employed to capture the relations among words and generate sentence-level representations. At the third stage, borrowing ideas from graph-based dependency parsing methods which provide a global view on the input sentence, we use biaffine classifiers to estimate the probability of each pair of start-end words to be an important segment. Experimental results on a public Vietnamese legal dataset show that our model outperforms the previous work by a large margin, achieving 94.79% in the F1 score. The results also prove the effectiveness of using contextual features extracted from pre-trained language models combined with other types of features such as character-level and POS-tag features when training on a limited dataset.
在本文中,我们提议利用深度学习从越南法律问题中提取重要信息,这是建立法律领域的问答系统的一项基本任务。给定一个自然语言的法律问题,目标是提取包含回答问题所需信息的所有片段。我们提出了一种深度模型,它在三个步骤中解决了任务。首先,我们的模型利用最近的高级自编码语言模型生成上下文单词嵌入,然后将字符级和词标签信息组合起来,以形成单词表示。接下来,我们采用双向的长期短期记忆网络来捕捉单词之间的关系并生成句子表示。在第三个步骤中,我们从基于图的依赖解析方法中借用了一些思想,提供了对整个输入句子的全局看法,我们使用双向晶格分类器来估计每个起始和结束词对的重要性概率。在公开越南法律数据集的实验结果表明,我们的模型比先前的工作表现好多了,F1得分达到94.79%。实验结果还证明了在训练在有限的数据集上时,使用预训练语言模型提取的上下文特征并与字符级和词标签特征等其他类型的特征相结合的有效性。
https://arxiv.org/abs/2304.14447
Fitness for Duty (FFD) techniques detects whether a subject is Fit to perform their work safely, which means no reduced alertness condition and security, or if they are Unfit, which means alertness condition reduced by sleepiness or consumption of alcohol and drugs. Human iris behaviour provides valuable information to predict FFD since pupil and iris movements are controlled by the central nervous system and are influenced by illumination, fatigue, alcohol, and drugs. This work aims to classify FFD using sequences of 8 iris images and to extract spatial and temporal information using Convolutional Neural Networks (CNN) and Long Short Term Memory Networks (LSTM). Our results achieved a precision of 81.4\% and 96.9\% for the prediction of Fit and Unfit subjects, respectively. The results also show that it is possible to determine if a subject is under alcohol, drug, and sleepiness conditions. Sleepiness can be identified as the most difficult condition to be determined. This system opens a different insight into iris biometric applications.
Duty fitness (FFD)技术检测是否受试者能够安全地完成工作,这意味着没有降低警觉性和安全性,或者如果受试者不fit,这意味着警觉性受到嗜睡或饮酒和药物的影响。人类iris行为提供预测FFD的宝贵信息,因为瞳孔和iris运动由中枢神经系统控制,并受到照明、疲劳、饮酒和药物的影响。该研究旨在使用序列化的8个iris图像分类FFD,并使用卷积神经网络(CNN)和长短时记忆网络(LSTM)提取空间和时间信息。我们的结果分别实现了81.4\%和96.9\%的预测精度,对fit和unfit受试者的预测。结果还表明,可以确定受试者是否处于酒精、药物和嗜睡状态。嗜睡可以被识别为最难确定的状态。该系统为iris生物特征应用提供了不同的视角。
https://arxiv.org/abs/2304.11858