Anomaly detection in dynamic graphs presents a significant challenge due to the temporal evolution of graph structures and attributes. The conventional approaches that tackle this problem typically employ an unsupervised learning framework, capturing normality patterns with exclusive normal data during training and identifying deviations as anomalies during testing. However, these methods face critical drawbacks: they either only depend on proxy tasks for general representation without directly pinpointing normal patterns, or they neglect to differentiate between spatial and temporal normality patterns, leading to diminished efficacy in anomaly detection. To address these challenges, we introduce a novel Spatial-Temporal memories-enhanced graph autoencoder (STRIPE). Initially, STRIPE employs Graph Neural Networks (GNNs) and gated temporal convolution layers to extract spatial features and temporal features, respectively. Then STRIPE incorporates separate spatial and temporal memory networks, which capture and store prototypes of normal patterns, thereby preserving the uniqueness of spatial and temporal normality. After that, through a mutual attention mechanism, these stored patterns are then retrieved and integrated with encoded graph embeddings. Finally, the integrated features are fed into the decoder to reconstruct the graph streams which serve as the proxy task for anomaly detection. This comprehensive approach not only minimizes reconstruction errors but also refines the model by emphasizing the compactness and distinctiveness of the embeddings in relation to the nearest memory prototypes. Through extensive testing, STRIPE has demonstrated a superior capability to discern anomalies by effectively leveraging the distinct spatial and temporal dynamics of dynamic graphs, significantly outperforming existing methodologies, with an average improvement of 15.39% on AUC values.
在动态图中的异常检测是一个挑战性的任务,因为图结构和属性的时间演化。解决这个问题的传统方法通常采用无监督学习框架,在训练期间捕获规范模式,并在测试期间识别异常。然而,这些方法面临着关键的缺陷:它们要么只依赖于一般表示的代理任务,没有直接确定规范模式,要么忽视了空间和时间规范模式之间的区别,导致异常检测的有效性降低。为了应对这些挑战,我们引入了一种新颖的空间-时间记忆增强图自编码器(STRIPE)。 首先,STRIPE采用图神经网络(GNNs)和有门时间卷积层来提取空间特征和时间特征。然后,STRIPE引入了单独的空间和时间记忆网络,它们捕获并存储规范模式的模板,从而保留空间和时间的独特性。接下来,通过自注意力机制,这些存储的模式被检索并整合与编码的图嵌入。最后,将整合的嵌入输入解码器以重构图流作为异常检测的代理任务。 这种全面的方法不仅减少了重构误差,而且通过强调嵌入与最近记忆原型之间的简洁性和差异性,优化了模型。通过广泛的测试,STRIPE已经证明了自己在区分异常方面的优越性能,有效提高了平均异常检测的准确率15.39%。
https://arxiv.org/abs/2403.09039
Forecasting winners in E-sports with real-time analytics has the potential to further engage audiences watching major tournament events. However, making such real-time predictions is challenging due to unpredictable variables within the game involving diverse player strategies and decision-making. Our work attempts to enhance audience engagement within video game tournaments by introducing a real-time method of predicting wins. Our Long Short Term Memory Network (LSTMs) based approach enables efficient predictions of win-lose outcomes by only using the health indicator of each player as a time series. As a proof of concept, we evaluate our model's performance within a classic, two-player arcade game, Super Street Fighter II Turbo. We also benchmark our method against state of the art methods for time series forecasting; i.e. Transformer models found in large language models (LLMs). Finally, we open-source our data set and code in hopes of furthering work in predictive analysis for arcade games.
通过实时分析预测电子竞技比赛的胜者具有进一步激发观众观看主要赛事活动潜力。然而,由于游戏内涉及多种玩家策略和决策的不确定变量,进行这样的实时预测具有挑战性。我们的工作旨在通过引入基于长短期记忆网络(LSTMs)的实时预测方法来增强电子竞技比赛中的观众参与度。通过使用每个玩家的健康状况作为时间序列,我们的基于LSTMs的方法可以高效预测胜负结果。作为概念证明,我们在经典的两个玩家街机游戏《超级街头霸王2 Turbo》中评估了我们的模型的性能。我们还与大型语言模型(LLM)中找到的Transformer模型进行了基准测试。最后,我们开源了我们的数据集和代码,希望进一步推动电子竞技游戏预测分析的工作。
https://arxiv.org/abs/2402.15923
Although supervised machine learning is popular for information extraction from clinical notes, creating large annotated datasets requires extensive domain expertise and is time-consuming. Meanwhile, large language models (LLMs) have demonstrated promising transfer learning capability. In this study, we explored whether recent LLMs can reduce the need for large-scale data annotations. We curated a manually-labeled dataset of 769 breast cancer pathology reports, labeled with 13 categories, to compare zero-shot classification capability of the GPT-4 model and the GPT-3.5 model with supervised classification performance of three model architectures: random forests classifier, long short-term memory networks with attention (LSTM-Att), and the UCSF-BERT model. Across all 13 tasks, the GPT-4 model performed either significantly better than or as well as the best supervised model, the LSTM-Att model (average macro F1 score of 0.83 vs. 0.75). On tasks with high imbalance between labels, the differences were more prominent. Frequent sources of GPT-4 errors included inferences from multiple samples and complex task design. On complex tasks where large annotated datasets cannot be easily collected, LLMs can reduce the burden of large-scale data labeling. However, if the use of LLMs is prohibitive, the use of simpler supervised models with large annotated datasets can provide comparable results. LLMs demonstrated the potential to speed up the execution of clinical NLP studies by reducing the need for curating large annotated datasets. This may result in an increase in the utilization of NLP-based variables and outcomes in observational clinical studies.
尽管监督机器学习在从临床笔记中提取信息方面广受欢迎,但创建大型注释数据集需要广泛的领域专业知识,并且需要花费大量时间。与此同时,大型语言模型(LLMs)已经展示了有希望的迁移学习能力。在这项研究中,我们探讨了最近LLMs是否可以减少大规模数据注释的需求。我们挑选了769个乳腺癌病理学报告的手动标注数据集,用13个类别进行标注,以比较GPT-4模型和GPT-3.5模型与三种架构模型的监督分类性能:随机森林分类器、带有注意的长短期记忆网络(LSTM-Att)和UCSF-BERT模型。在所有13个任务中,GPT-4模型在大多数任务上表现得比或与最好的监督模型更好,LSTM-Att模型的平均宏观F1分数为0.83,而0.75。在标签高度不平衡的任务中,差异更加明显。GPT-4错误的常见来源包括多个样本的推理和复杂任务设计。在无法轻松收集大型注释数据集的复杂任务中,LLMs可以减轻大型数据标注的负担。然而,如果使用LLMs的成本太高,使用简单的监督模型并提供类似结果可能是有意义的。LLMs证明通过减少对大型注释数据集的需求,可以加速临床自然语言处理研究的执行。这可能导致在观察性临床研究中使用NLP变量的利用率增加和结果增加。
https://arxiv.org/abs/2401.13887
In the era of information proliferation, discerning the credibility of news content poses an ever-growing challenge. This paper introduces RELIANCE, a pioneering ensemble learning system designed for robust information and fake news credibility evaluation. Comprising five diverse base models, including Support Vector Machine (SVM), naive Bayes, logistic regression, random forest, and Bidirectional Long Short Term Memory Networks (BiLSTMs), RELIANCE employs an innovative approach to integrate their strengths, harnessing the collective intelligence of the ensemble for enhanced accuracy. Experiments demonstrate the superiority of RELIANCE over individual models, indicating its efficacy in distinguishing between credible and non-credible information sources. RELIANCE, also surpasses baseline models in information and news credibility assessment, establishing itself as an effective solution for evaluating the reliability of information sources.
在信息爆炸的时代,辨别新闻内容的可靠性面临着日益增长的任务。本文介绍了一个先驱的集成学习系统——RELIANCE,旨在评估信息的可靠性和虚假新闻的可靠性。该系统由五种不同的基础模型组成,包括支持向量机(SVM)、朴素贝叶斯、逻辑回归、随机森林和双向长短时记忆网络(BiLSTMs)。RELIANCE采用了一种创新的方法,整合其优势,并利用集体的智慧来提高准确性。实验证明,RELIANCE相对于单独的模型具有优越性,表明其在区分可信和不可信信息来源方面的有效性。此外,RELIANCE在信息和新闻可靠性评估方面超过了基线模型,成为评估信息来源可靠性的有效解决方案。
https://arxiv.org/abs/2401.10940
In the realm of financial decision-making, predicting stock prices is pivotal. Artificial intelligence techniques such as long short-term memory networks (LSTMs), support-vector machines (SVMs), and natural language processing (NLP) models are commonly employed to predict said prices. This paper utilizes stock percentage change as training data, in contrast to the traditional use of raw currency values, with a focus on analyzing publicly released news articles. The choice of percentage change aims to provide models with context regarding the significance of price fluctuations and overall price change impact on a given stock. The study employs specialized BERT natural language processing models to predict stock price trends, with a particular emphasis on various data modalities. The results showcase the capabilities of such strategies with a small natural language processing model to accurately predict overall stock trends, and highlight the effectiveness of certain data features and sector-specific data.
在金融决策领域,预测股票价格至关重要。通常使用长期记忆网络(LSTMs)、支持向量机(SVMs)和自然语言处理(NLP)模型等人工智能技术来预测这些价格。本文使用股票百分比变化作为训练数据,而非传统使用货币原始值,重点分析公开发布的新闻文章。百分比变化的选取旨在为模型提供关于价格波动和整体价格变化对特定股票的重要性。该研究采用专门的BERT自然语言处理模型预测股票价格趋势,特别关注各种数据模态。结果表明,这类策略 small natural language processing model 确实具有准确预测整体股票趋势的能力,并突出了某些数据特征和行业特定数据的有效性。
https://arxiv.org/abs/2401.01487
The number of collisions between aircraft and birds in the airspace has been increasing at an alarming rate over the past decade due to increasing bird population, air traffic and usage of quieter aircraft. Bird strikes with aircraft are anticipated to increase dramatically when emerging Advanced Air Mobility aircraft start operating in the low altitude airspace where probability of bird strikes is the highest. Not only do such bird strikes can result in human and bird fatalities, but they also cost the aviation industry millions of dollars in damages to aircraft annually. To better understand the causes and effects of bird strikes, research to date has mainly focused on analyzing factors which increase the probability of bird strikes, identifying high risk birds in different locations, predicting the future number of bird strike incidents, and estimating cost of bird strike damages. However, research on bird movement prediction for use in flight planning algorithms to minimize the probability of bird strikes is very limited. To address this gap in research, we implement four different types of Long Short-Term Memory (LSTM) models to predict bird movement latitudes and longitudes. A publicly available data set on the movement of pigeons is utilized to train the models and evaluate their performances. Using the bird flight track predictions, aircraft departures from Cleveland Hopkins airport are simulated to be delayed by varying amounts to avoid potential bird strikes with aircraft during takeoff. Results demonstrate that the LSTM models can predict bird movement with high accuracy, achieving a Mean Absolute Error of less than 100 meters, outperforming linear and nonlinear regression models. Our findings indicate that incorporating bird movement prediction into flight planning can be highly beneficial.
过去十年里,由于鸟类数量的增加、空勤和飞机使用的安静型飞机越来越多,空中撞击航空器与鸟类之间的碰撞数量不断增加。预计,当低空空域中出现新型先进空运工具时,预计鸟类撞击航空器的显著增加。不仅这些鸟类撞击会导致人类和鸟类死亡,而且它们每年还会给航空业造成数百万美元的损失。为了更好地了解撞击的原因和影响,迄今为止,研究主要集中在分析增加鸟类撞击概率的因素、确定不同地点的高风险鸟类、预测未来的鸟类撞击事件以及估计鸟类撞击损失成本。然而,关于用于飞行计划算法预测鸟类移动的研究却非常有限。为了填补这一研究空白,我们采用了四种不同的长短时记忆(LSTM)模型来预测鸟的移动纬度和经度。一个可公开获取的鸽子运动数据集用于训练模型并评估其性能。利用鸟类飞行轨迹预测,我们将克利夫兰霍金斯机场的飞机出发时间模拟为由于避免与鸟类撞击而推迟,数量 varying。结果表明,LSTM模型可以预测鸟类运动,具有较高的准确度,实现平均绝对误差小于100米,优于线性和非线性回归模型。我们的研究结果表明,将鸟类运动预测纳入飞行计划可以带来极大的好处。
https://arxiv.org/abs/2312.12461
Medication recommendation is a vital task for improving patient care and reducing adverse events. However, existing methods often fail to capture the complex and dynamic relationships among patient medical records, drug efficacy and safety, and drug-drug interactions (DDI). In this paper, we propose ALGNet, a novel model that leverages light graph convolutional networks (LGCN) and augmentation memory networks (AMN) to enhance medication recommendation. LGCN can efficiently encode the patient records and the DDI graph into low-dimensional embeddings, while AMN can augment the patient representation with external knowledge from a memory module. We evaluate our model on the MIMIC-III dataset and show that it outperforms several baselines in terms of recommendation accuracy and DDI avoidance. We also conduct an ablation study to analyze the effects of different components of our model. Our results demonstrate that ALGNet can achieve superior performance with less computation and more interpretability. The implementation of this paper can be found at: this https URL.
药物推荐是提高患者护理和减少不良事件的关键任务。然而,现有的方法通常无法捕捉患者医疗记录、药物效力和安全性以及药物-药物相互作用(DDI)之间的复杂和动态关系。在本文中,我们提出了ALGNet,一种新模型,它利用光图卷积网络(LGCN)和增强记忆网络(AMN)来增强药物推荐。LGCN可以有效地将患者记录和DDI图编码为低维嵌入,而AMN可以利用外部记忆模块的外部知识来增强患者表示。我们在MIMIC-III数据集上评估我们的模型,并证明了其在推荐准确性和DDI避免方面的性能优于几个基线。我们还进行了消融研究,以分析我们模型中不同组件的影响。本文的结果表明,ALGNet可以在更少的计算和更高的可解释性下实现卓越的性能。该论文的实施可以从以下链接找到:https://this URL。
https://arxiv.org/abs/2312.08377
Image captioning bridges the gap between vision and language by automatically generating natural language descriptions for images. Traditional image captioning methods often overlook the preferences and characteristics of users. Personalized image captioning solves this problem by incorporating user prior knowledge into the model, such as writing styles and preferred vocabularies. Most existing methods emphasize the user context fusion process by memory networks or transformers. However, these methods ignore the distinct domains of each dataset. Therefore, they need to update the entire caption model parameters when meeting new samples, which is time-consuming and calculation-intensive. To address this challenge, we propose a novel personalized image captioning framework that leverages user context to consider personality factors. Additionally, our framework utilizes the prefix-tuning paradigm to extract knowledge from a frozen large language model, reducing the gap between different language domains. Specifically, we employ CLIP to extract the visual features of an image and align the semantic space using a query-guided mapping network. By incorporating the transformer layer, we merge the visual features with the user's contextual prior knowledge to generate informative prefixes. Moreover, we employ GPT-2 as the frozen large language model. With a small number of parameters to be trained, our model performs efficiently and effectively. Our model outperforms existing baseline models on Instagram and YFCC100M datasets across five evaluation metrics, demonstrating its superiority, including twofold improvements in metrics such as BLEU-4 and CIDEr.
图像标题通过自动生成自然语言描述来弥合视觉和语言之间的差距。传统的图像标题方法通常忽视用户的偏好和特点。个性化的图像标题通过将用户的先前知识融入模型中来解决这一问题,例如写作风格和喜欢的词汇。大多数现有方法强调通过记忆网络或变换器来融合用户上下文的过程。然而,这些方法忽略了每个数据集的独特领域。因此,在遇到新样本时,它们需要更新整个标题模型参数,这耗时且计算密集型。为解决这个问题,我们提出了一个新颖的个性化图像标题框架,它利用用户上下文来考虑个性因素。此外,我们的框架利用了前缀调整范式来提取知识,从而减少不同语言领域之间的差距。具体来说,我们使用CLIP提取图像的视觉特征,并通过查询引导映射网络将语义空间对齐。通过包含变换器层,我们将视觉特征与用户的上下文先验知识相结合,生成有信息的前缀。此外,我们使用GPT-2作为冻结的大语言模型。由于只需要很少的参数来训练,我们的模型具有高效且有效的能力。我们的模型在Instagram和YFCC100M数据集上优于现有基线模型,在五个评估指标上实现了卓越的表现,包括BLEU-4的 twice 改善和CIDEr的改善。
https://arxiv.org/abs/2312.04793
In the era of space exploration, the implications of space weather have become increasingly evident. Central to this is the phenomenon of coronal holes, which can significantly influence the functioning of satellites and aircraft. These coronal holes, present on the sun, are distinguished by their open magnetic field lines and comparatively cooler temperatures, leading to the emission of solar winds at heightened rates. To anticipate the effects of these coronal holes on Earth, our study harnesses computer vision to pinpoint the coronal hole regions and estimate their dimensions using imagery from the Solar Dynamics Observatory (SDO). Further, we deploy deep learning methodologies, specifically the Long Short-Term Memory (LSTM) approach, to analyze the trends in the data related to the area of the coronal holes and predict their dimensions across various solar regions over a span of seven days. By evaluating the time series data concerning the area of the coronal holes, our research seeks to uncover patterns in the behavior of coronal holes and comprehend their potential influence on space weather occurrences. This investigation marks a pivotal stride towards bolstering our capacity to anticipate and brace for space weather events that could have ramifications for Earth and its technological apparatuses.
在太空探索的时代,太空天气的影响越来越明显。这一现象的核心是太阳黑子现象,黑子对卫星和飞机的运行具有重要影响。这些黑子分布在太阳上,其特征是开放式的磁场线和相对较冷的温度,导致太阳风以更高的速率发射。为了预测这些黑子对地球的影响,我们的研究利用计算机视觉技术确定黑子区域,并使用太阳能动力学观测站(SDO)的图像估计它们的尺寸。此外,我们运用深度学习方法,特别是长短时记忆(LSTM)方法,对黑子区域的数据进行分析和预测,预测它们在七个不同太阳区域中的尺寸。通过评估黑子区域的时间序列数据,我们的研究旨在揭示黑子行为的模式,并理解它们对太空天气事件的影响。这次调查标志着我们向前迈进,提高我们对预测和应对太空天气事件的准备能力,从而影响地球及其技术设备的未来。
https://arxiv.org/abs/2301.06732
Development of robust general purpose 3D segmentation frameworks using the latest deep learning techniques is one of the active topics in various bio-medical domains. In this work, we introduce Temporal Cubic PatchGAN (TCuP-GAN), a volume-to-volume translational model that marries the concepts of a generative feature learning framework with Convolutional Long Short-Term Memory Networks (LSTMs), for the task of 3D segmentation. We demonstrate the capabilities of our TCuP-GAN on the data from four segmentation challenges (Adult Glioma, Meningioma, Pediatric Tumors, and Sub-Saharan Africa subset) featured within the 2023 Brain Tumor Segmentation (BraTS) Challenge and quantify its performance using LesionWise Dice similarity and $95\%$ Hausdorff Distance metrics. We demonstrate the successful learning of our framework to predict robust multi-class segmentation masks across all the challenges. This benchmarking work serves as a stepping stone for future efforts towards applying TCuP-GAN on other multi-class tasks such as multi-organelle segmentation in electron microscopy imaging.
使用最新的深度学习技术开发鲁棒的一般3D分割框架是各种生物医学领域的一个活跃主题。在这项工作中,我们介绍了Temporal Cubic PatchGAN(TCuP-GAN),一种将生成特征学习框架与卷积长短期记忆网络(LSTMs)相结合的体积到体积的传输模型,用于3D分割任务。我们展示了TCuP-GAN在2023年脑肿瘤分割(BraTS)挑战中的数据上的能力,并使用LesionWise Dice相似度和$95\%$ 汉明距离度量对其性能进行量化。我们展示了我们的框架在所有挑战中成功预测鲁棒多类分割掩码。这一基准工作为未来在电子显微镜图像成像等更多多类任务上应用TCuP-GAN奠定了基础。
https://arxiv.org/abs/2311.14148
Quantifying predictive uncertainty of deep semantic segmentation networks is essential in safety-critical tasks. In applications like autonomous driving, where video data is available, convolutional long short-term memory networks are capable of not only providing semantic segmentations but also predicting the segmentations of the next timesteps. These models use cell states to broadcast information from previous data by taking a time series of inputs to predict one or even further steps into the future. We present a temporal postprocessing method which estimates the prediction performance of convolutional long short-term memory networks by either predicting the intersection over union of predicted and ground truth segments or classifying between intersection over union being equal to zero or greater than zero. To this end, we create temporal cell state-based input metrics per segment and investigate different models for the estimation of the predictive quality based on these metrics. We further study the influence of the number of considered cell states for the proposed metrics.
量化深度语义分割网络的预测不确定性对于关键任务来说是至关重要的。在像自动驾驶这样的应用中,由于视频数据可用,卷积长短期记忆网络不仅能够提供语义分割,而且能够预测下一个时间步的分割。这些模型通过细胞状态从前的数据中传播信息,通过对输入的时间序列进行预测,预测一或甚至是未来的更多步骤。我们提出了一个基于时间戳的后处理方法,该方法通过预测预测和真实分割的交集或并集来估计卷积长短期记忆网络的预测性能。为此,我们为每个分割段创建了基于细胞状态的时间戳输入指标,并研究了基于这些指标估计预测质量的不同模型。我们进一步研究了所提出的指标中考虑的细胞状态的数量对预测质量的影响。
https://arxiv.org/abs/2311.07477
Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries.
人类视觉可以区分出数百万种色彩的广泛范围,据估计在2到7百万个可辨别色调之间。然而,这一令人印象深刻的范围并不暗示所有这些颜色都已在我们的词汇库中准确地命名和描述。我们通常将颜色与我们在日常生活中熟悉的物体和概念相关联。这项研究旨在弥合我们视觉感知到无数种色彩与我们准确表达和命名它们的能力之间的差距。为了实现这一目标,利用双向长短时记忆(BiLSTM)网络与主动学习,开发了一个新模型。该模型在专门为这项研究 curated的私用数据集上运行。这一研究的主要目标是为分类和命名以前未命名的颜色或识别中间色调提供一个实用的工具。研究结果强调了这种创新方法在颠覆我们对色彩感知和语言的理解方面的潜力。通过严格的实验和分析,这项研究揭示了自然语言处理(NLP)在各种行业应用中的一个有前景的途径。通过促进对丰富色彩范围的探索,NLP的应用范围超越了传统边界。
https://arxiv.org/abs/2311.06542
Maritime transport is paramount to global economic growth and environmental sustainability. In this regard, the Automatic Identification System (AIS) data plays a significant role by offering real-time streaming data on vessel movement, which allows for enhanced traffic surveillance, assisting in vessel safety by avoiding vessel-to-vessel collisions and proactively preventing vessel-to-whale ones. This paper tackles an intrinsic problem to trajectory forecasting: the effective multi-path long-term vessel trajectory forecasting on engineered sequences of AIS data. We utilize an encoder-decoder model with Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data. We feed the model with probabilistic features engineered from the AIS data that refer to the potential route and destination of each trajectory so that the model, leveraging convolutional layers for spatial feature learning and a position-aware attention mechanism that increases the importance of recent timesteps of a sequence during temporal feature learning, forecasts the vessel trajectory taking the potential route and destination into account. The F1 Score of these features is approximately 85% and 75%, indicating their efficiency in supplementing the neural network. We trialed our model in the Gulf of St. Lawrence, one of the North Atlantic Right Whales (NARW) habitats, achieving an R2 score exceeding 98% with varying techniques and features. Despite the high R2 score being attributed to well-defined shipping lanes, our model demonstrates superior complex decision-making during path selection. In addition, our model shows enhanced accuracy, with average and median forecasting errors of 11km and 6km, respectively. Our study confirms the potential of geographical data engineering and trajectory forecasting models for preserving marine life species.
海上运输对全球经济增长和环境保护至关重要。在这方面,自动识别系统(AIS)数据通过提供关于船舶运动的实时流式数据发挥着重要作用,从而提高了交通监控,通过避免船舶之间的碰撞,以及主动预防船舶与鲸鱼的碰撞,有助于船舶安全。本文解决了轨迹预测的一个固有难题:利用工程序列的AIS数据进行多路径长短期记忆网络(Bi-LSTM)预测船舶的下一个12小时轨迹。我们将模型喂入由AIS数据生成的概率特征,这些特征指定了每个轨迹的潜在路线和目的地,以便模型利用卷积层进行空间特征学习,并具有位置感知注意机制,在时间特征学习过程中增加对序列最近时刻的重视,从而预测船舶轨迹时考虑潜在路线和目的地。这些特征的F1得分约为85%和75%,表明其补充神经网络的效率。我们在大西洋一个右旋鲸(NARW)的栖息地——大西洋北部海域的墨西哥湾进行试验,使用不同的技术和特征,取得了超过98%的R2得分。尽管高R2得分归因于定义良好的航运通道,但我们的模型在路径选择过程中表现出卓越的复杂决策能力。此外,我们的模型显示出增强的准确性,平均预测误差为11公里,中位数预测误差为6公里。我们的研究证实了地理数据工程和轨迹预测模型的潜在价值,可以用于保护海洋生物物种。
https://arxiv.org/abs/2310.18948
Hopfield networks are widely used in neuroscience as simplified theoretical models of biological associative memory. The original Hopfield networks store memories by encoding patterns of binary associations, which result in a synaptic learning mechanism known as Hebbian learning rule. Modern Hopfield networks can achieve exponential capacity scaling by using highly non-linear energy functions. However, the energy function of these newer models cannot be straightforwardly compressed into binary synaptic couplings and it does not directly provide new synaptic learning rules. In this work we show that generative diffusion models can be interpreted as energy-based models and that, when trained on discrete patterns, their energy function is equivalent to that of modern Hopfield networks. This equivalence allows us to interpret the supervised training of diffusion models as a synaptic learning process that encodes the associative dynamics of a modern Hopfield network in the weight structure of a deep neural network. Accordingly, in our experiments we show that the storage capacity of a continuous modern Hopfield network is identical to the capacity of a diffusion model. Our results establish a strong link between generative modeling and the theoretical neuroscience of memory, which provide a powerful computational foundation for the reconstructive theory of memory, where creative generation and memory recall can be seen as parts of a unified continuum.
霍夫海姆网络在神经科学中被广泛应用,作为生物学联想记忆简化的理论模型。最初的霍夫海姆网络通过编码二进制关联模式存储记忆,导致一种称为赫伯bian学习规则的联想学习机制。现代霍夫海姆网络使用高度非线性的能量函数可以实现指数级容量扩展。但是这些新的模型的能量函数不能直接压缩为二进制联想耦合,并且它们并不直接提供新的联想学习规则。在本文中,我们表明生成扩散模型可以被视为基于能量模型的模型,并且当训练在离散模式时,它们的能量函数等价于现代霍夫海姆网络的能量函数。这种等价性可以解释监督训练扩散模型的过程视为一个联想学习过程,编码现代霍夫海姆网络的联想动态在深度神经网络权重结构中。因此,在我们的实验中,我们表明连续的现代霍夫海姆网络的存储容量与扩散模型的容量相同。我们的结果建立了生成建模和记忆理论 neuroscience 之间的强烈联系,为记忆重建理论提供了强大的计算基础,其中创造性生成和记忆回忆可以被视为统一连续的一部分。
https://arxiv.org/abs/2309.17290
This study provides Urdu poetry generated using different deep-learning techniques and algorithms. The data was collected through the Rekhta website, containing 1341 text files with several couplets. The data on poetry was not from any specific genre or poet. Instead, it was a collection of mixed Urdu poems and Ghazals. Different deep learning techniques, such as the model applied Long Short-term Memory Networks (LSTM) and Gated Recurrent Unit (GRU), have been used. Natural Language Processing (NLP) may be used in machine learning to understand, analyze, and generate a language humans may use and understand. Much work has been done on generating poetry for different languages using different techniques. The collection and use of data were also different for different researchers. The primary purpose of this project is to provide a model that generates Urdu poems by using data completely, not by sampling data. Also, this may generate poems in pure Urdu, not Roman Urdu, as in the base paper. The results have shown good accuracy in the poems generated by the model.
这项研究提供了使用不同深度学习技术和算法生成的古拉姆诗歌。数据是通过Rekhta网站收集的,其中包括1341个文本文件,其中包括几个句子。诗歌数据来自不同的具体流派或诗人,而是一组混合了古拉姆诗歌和 Ghazals 的诗歌集。使用了不同的深度学习技术,例如应用了 Long Short-term Memory Networks (LSTM) 和 Gated Recurrent Unit (GRU) 的模型。自然语言处理(NLP)可以在机器学习中用于理解、分析和生成人类可以使用和理解的语言。大量工作已经用于生成不同语言的诗歌,使用不同的技术和方法。数据收集和使用的研究人员也有所不同。该 project 的主要目的是提供一个模型,它可以完全使用数据生成古拉姆诗歌,而不是通过采样数据。此外,这可能会生成纯古拉姆诗歌,而不是在基 paper 中所使用的罗马古拉姆诗歌。结果在模型生成的诗歌中显示出良好的准确性。
https://arxiv.org/abs/2309.14233
Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
边缘计算是一个有前途的解决方案,用于处理来自传感器和物联网设备的高度多维模拟数据,例如自主无人机的应用。然而,边缘设备有限的存储和计算资源使得在边缘进行复杂的预测建模具有挑战性。计算在内存(CiM)已经成为最小化基于深度学习的推理所需的能量的主要范式,CiM方法使用二进制瓦氏哈姆变换,减少了深度学习模型所需的参数(在 MobileNetV2中减少87%),并允许计算在内存中执行,这更好地利用了并行性在推理期间。第二,描述了在 CiM数组中的内存沉浸式协作数字重构方法,以减少传统ADC的面积 overhead。这使得在相对较小 footprint 的设计中更多 CiM数组实现,导致更好的并行性和减少外部内存访问。不同网络配置被探索,其中 Flash、SA 和它们的混合数字重构步骤可以使用内存沉浸式方案实现。结果使用65纳米 CMOS测试芯片演示了,与40纳米节点5位SAR ADC 和5位 Flash ADC相比,表现出显著的面积和能源节省。通过更高效地处理模拟数据,可以选择保留传感器中的有价值的数据,减轻模拟数据倾泻带来的挑战。
https://arxiv.org/abs/2309.11048
Electricity demand forecasting is a well established research field. Usually this task is performed considering historical loads, weather forecasts, calendar information and known major events. Recently attention has been given on the possible use of new sources of information from textual news in order to improve the performance of these predictions. This paper proposes a Long and Short-Term Memory (LSTM) network incorporating textual news features that successfully predicts the deterministic and probabilistic tasks of the UK national electricity demand. The study finds that public sentiment and word vector representations related to transport and geopolitics have time-continuity effects on electricity demand. The experimental results show that the LSTM with textual features improves by more than 3% compared to the pure LSTM benchmark and by close to 10% over the official benchmark. Furthermore, the proposed model effectively reduces forecasting uncertainty by narrowing the confidence interval and bringing the forecast distribution closer to the truth.
电力需求预测是一个已经建立的研究领域。通常,这项任务需要考虑历史负荷、天气预报、日历信息和已知的主要事件等因素。最近,人们开始关注从文本新闻中可能使用的新的信息来源,以改善这些预测的性能。本文提出了一个包含文本新闻特征的长短时记忆网络(LSTM),成功预测了英国的全国电力需求确定性和概率任务。研究发现,与交通和地缘政治相关的公众情绪和词向量表示对电力需求具有时间连续性的影响。实验结果显示,与纯LSTM基准相比,具有文本特征的LSTM提高了超过3%,与官方基准相比提高了接近10%。此外, proposed 模型通过缩小 confidence interval 和将预测分布更接近真相,有效地减少了预测不确定性。
https://arxiv.org/abs/2309.06793
Introduction: Electroencephalogram (EEG) signals have gained significant popularity in various applications due to their rich information content. However, these signals are prone to contamination from various sources of artifacts, notably the electrooculogram (EOG) artifacts caused by eye movements. The most effective approach to mitigate EOG artifacts involves recording EOG signals simultaneously with EEG and employing blind source separation techniques, such as independent component analysis (ICA). Nevertheless, the availability of EOG recordings is not always feasible, particularly in pre-recorded datasets. Objective: In this paper, we present a novel methodology that combines a long short-term memory (LSTM)-based neural network with ICA to address the challenge of EOG artifact removal from contaminated EEG signals. Approach: Our approach aims to accomplish two primary objectives: 1) estimate the horizontal and vertical EOG signals from the contaminated EEG data, and 2) employ ICA to eliminate the estimated EOG signals from the EEG, thereby producing an artifact-free EEG signal. Main results: To evaluate the performance of our proposed method, we conducted experiments on a publicly available dataset comprising recordings from 27 participants. We employed well-established metrics such as mean squared error, mean absolute error, and mean error to assess the quality of our artifact removal technique. Significance: Furthermore, we compared the performance of our approach with two state-of-the-art deep learning-based methods reported in the literature, demonstrating the superior performance of our proposed methodology.
介绍:EEG信号因其丰富的信息内容而在各种应用中获得了广泛应用。然而,这些信号容易受到各种干扰项的影响,特别是由于眼睛运动引起的眼动电信号干扰项。为了减轻眼动电信号干扰项的影响,我们提出了一种新方法,它同时记录EEG信号和眼动电信号,并使用独立成分分析(ICA)等盲源分离技术。方法:我们的目标是实现两个主要目标:1)从污染的EEG数据中估计水平和垂直的眼动电信号;2)使用ICA从EEG中删除估计的眼动电信号,从而生成无干扰的EEG信号。结果:为了评估我们提出的新方法的性能,我们在一个包含27个参与者记录的公开数据集上进行了实验。我们使用了 established metrics,如平方误差、绝对误差和平均误差,来评估我们的干扰去除技术的质量。意义:此外,我们比较了我们的新方法的性能与文献中报道的两种先进的深度学习方法,证明了我们提出的新方法的性能优势。
https://arxiv.org/abs/2308.13371
Image captioning, like many tasks involving vision and language, currently relies on Transformer-based architectures for extracting the semantics in an image and translating it into linguistically coherent descriptions. Although successful, the attention operator only considers a weighted summation of projections of the current input sample, therefore ignoring the relevant semantic information which can come from the joint observation of other samples. In this paper, we devise a network which can perform attention over activations obtained while processing other training samples, through a prototypical memory model. Our memory models the distribution of past keys and values through the definition of prototype vectors which are both discriminative and compact. Experimentally, we assess the performance of the proposed model on the COCO dataset, in comparison with carefully designed baselines and state-of-the-art approaches, and by investigating the role of each of the proposed components. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training. Source code and trained models are available at: this https URL.
图像标题与许多涉及视觉和语言的任务的Task一样,目前依赖于基于Transformer的架构提取图像语义并将它翻译成语言 coherent 的描述。尽管取得了成功,但注意力操作只考虑当前输入样本 projections 的加权总和,因此忽略了可以从其他样本 joint 观察的相关语义信息。在本文中,我们设计了一种网络,可以在处理其他训练样本时对获取的激活进行注意力处理,通过创建一个原型记忆模型。我们的记忆模型通过定义具有区分性和紧凑性的原型向量来定义过去的键和值分布。实验中,我们比较了精心设计的基准线和当前方法的性能,并研究了每个 proposed 组件的作用。我们证明了,我们的提议可以在仅使用交叉熵训练的情况下,或在使用自我关键序列训练的情况下,提高编码器和解码器 Transformer 的性能,从而提高了3.7 CIDEr 点。源代码和训练模型可在以下httpsURL上获取。
https://arxiv.org/abs/2308.12383
Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
在重症监护室(ICU)中进行广泛的床边监测,产生了关于病人生理学的复杂的时序数据,为临床数据分析提供了一个向上扩展的背景。另一方面,在这些数据中发现时序模式可能提供高预测能力,从而有助于预测临床事件。因此,在这次工作中,我们研究了一种自动数据驱动系统,该系统从电子健康记录(EHR)中分析大量的多变量时序数据,并提取高级信息,以预测在医院内的死亡率和住院天数(LOS)。实际上,我们将该网络的适用性降低到6小时,以增强临床任务。实验结果显示,LSTM模型用严格的多变量时序测量方法构建的实际预测引擎的效率。
https://arxiv.org/abs/2308.12800