Vehicular communication systems face significant challenges due to high mobility and rapidly changing environments, which affect the channel over which the signals travel. To address these challenges, neural network (NN)-based channel estimation methods have been suggested. These methods are primarily trained on high signal-to-noise ratio (SNR) with the assumption that training a NN in less noisy conditions can result in good generalisation. This study examines the effectiveness of training NN-based channel estimators on mixed SNR datasets compared to training solely on high SNR datasets, as seen in several related works. Estimators evaluated in this work include an architecture that uses convolutional layers and self-attention mechanisms; a method that employs temporal convolutional networks and data pilot-aided estimation; two methods that combine classical methods with multilayer perceptrons; and the current state-of-the-art model that combines Long-Short-Term Memory networks with data pilot-aided and temporal averaging methods as post processing. Our results indicate that using only high SNR data for training is not always optimal, and the SNR range in the training dataset should be treated as a hyperparameter that can be adjusted for better performance. This is illustrated by the better performance of some models in low SNR conditions when trained on the mixed SNR dataset, as opposed to when trained exclusively on high SNR data.
车载通信系统面临着由于车辆高速移动和快速变化的环境所带来的显著挑战,这些因素影响了信号传输所依赖的信道。为解决这些问题,基于神经网络(NN)的信道估计方法已被提出。这类方法主要是在高信噪比(SNR)条件下训练的,并假设在噪声较少的情况下训练神经网络可以实现更好的泛化能力。本研究探讨了使用混合SNR数据集来训练基于神经网络的信道估计算法的有效性,而非仅在高SNR数据集上进行训练,后者是许多相关工作中的常见做法。 本文中评估的估计器包括一个采用卷积层和自注意机制的架构;一种利用时间卷积网络和数据导频辅助估计的方法;两种结合经典方法与多层感知机的方法;以及目前最先进的将长短期记忆(LSTM)网络与数据导频辅助和时间平均方法作为后处理手段的模型。 我们的结果显示,仅使用高SNR数据进行训练并不总是最优选择,并且在训练数据集中SNR范围应该被视为可以调整以获得更好性能的一个超参数。这一点通过一些模型在低SNR条件下使用混合SNR数据集进行训练时表现优于单纯使用高SNR数据集训练得到的验证,体现了其有效性。
https://arxiv.org/abs/2502.06824
Accurate detection of traffic anomalies is crucial for effective urban traffic management and congestion mitigation. We use the Spatiotemporal Generative Adversarial Network (STGAN) framework combining Graph Neural Networks and Long Short-Term Memory networks to capture complex spatial and temporal dependencies in traffic data. We apply STGAN to real-time, minute-by-minute observations from 42 traffic cameras across Gothenburg, Sweden, collected over several months in 2020. The images are processed to compute a flow metric representing vehicle density, which serves as input for the model. Training is conducted on data from April to November 2020, and validation is performed on a separate dataset from November 14 to 23, 2020. Our results demonstrate that the model effectively detects traffic anomalies with high precision and low false positive rates. The detected anomalies include camera signal interruptions, visual artifacts, and extreme weather conditions affecting traffic flow.
准确检测交通异常对于有效的城市交通管理和缓解拥堵至关重要。我们使用结合了图神经网络和长短期记忆网络的时空生成对抗网络(STGAN)框架来捕捉交通数据中的复杂空间和时间依赖关系。我们将STGAN应用于瑞典哥德堡42个交通摄像头收集的真实时钟、分钟级观测数据,这些数据于2020年数月内采集。通过处理图像计算出代表车辆密度的流量指标作为模型输入。训练在2020年4月至11月的数据上进行,验证则使用了2020年11月14日至23日的独立数据集。我们的结果显示,该模型能够以高精度和低假阳性率有效地检测交通异常。所检测到的异常包括摄像头信号中断、视觉伪影以及影响车流的极端天气状况。
https://arxiv.org/abs/2502.01391
A crucial step to efficiently integrate Whole Slide Images (WSIs) in computational pathology is assigning a single high-quality feature vector, i.e., one embedding, to each WSI. With the existence of many pre-trained deep neural networks and the emergence of foundation models, extracting embeddings for sub-images (i.e., tiles or patches) is straightforward. However, for WSIs, given their high resolution and gigapixel nature, inputting them into existing GPUs as a single image is not feasible. As a result, WSIs are usually split into many patches. Feeding each patch to a pre-trained model, each WSI can then be represented by a set of patches, hence, a set of embeddings. Hence, in such a setup, WSI representation learning reduces to set representation learning where for each WSI we have access to a set of patch embeddings. To obtain a single embedding from a set of patch embeddings for each WSI, multiple set-based learning schemes have been proposed in the literature. In this paper, we evaluate the WSI search performance of multiple recently developed aggregation techniques (mainly set representation learning techniques) including simple average or max pooling operations, Deep Sets, Memory networks, Focal attention, Gaussian Mixture Model (GMM) Fisher Vector, and deep sparse and binary Fisher Vector on four different primary sites including bladder, breast, kidney, and Colon from TCGA. Further, we benchmark the search performance of these methods against the median of minimum distances of patch embeddings, a non-aggregating approach used for WSI retrieval.
将全滑动图像(WSI)高效地集成到计算病理学中的一个关键步骤是为每个WSI分配一个高质量的特征向量,即单一嵌入。鉴于许多预训练深度神经网络的存在以及基础模型的出现,提取子图(例如,切片或补丁)的嵌入变得简单直接。然而,对于WSIs来说,由于其高分辨率和数吉像素的特性,将它们作为单个图像输入现有GPU中是不可行的。因此,通常会将WSIs分割成许多小块。通过将每个小块传递给预训练模型,每个WSI可以由一组小块表示,从而形成一系列嵌入。在这种设置下,WSI表示学习简化为集合表示学习,在这种情况下,对于每一个WSI,我们都可以访问到一组补丁嵌入。 为了从每张WSI的多个补丁嵌入中获得单一嵌入,文献中提出了多种基于集的方法。在这篇论文中,我们在四个不同的主要位置(包括TCGA的数据集中膀胱、乳腺、肾脏和结肠)上评估了近期开发出的多项聚合技术(主要是集合表示学习技术),例如简单平均或最大池化操作、Deep Sets、内存网络、焦点注意、高斯混合模型(GMM)、Fisher Vector以及深度稀疏和二进制Fisher Vector。此外,我们还将这些方法的搜索性能与补丁嵌入最小距离中位数进行了基准测试,后者是一种用于WSI检索的非聚合方法。
https://arxiv.org/abs/2501.17822
Emotion recognition is a critical task in human-computer interaction, enabling more intuitive and responsive systems. This study presents a multimodal emotion recognition system that combines low-level information from audio and text, leveraging both Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory Networks (BiLSTMs). The proposed system consists of two parallel networks: an Audio Block and a Text Block. Mel Frequency Cepstral Coefficients (MFCCs) are extracted and processed by a BiLSTM network and a 2D convolutional network to capture low-level intrinsic and extrinsic features from speech. Simultaneously, a combined BiLSTM-CNN network extracts the low-level sequential nature of text from word embeddings corresponding to the available audio. This low-level information from speech and text is then concatenated and processed by several fully connected layers to classify the speech emotion. Experimental results demonstrate that the proposed EmoTech accurately recognizes emotions from combined audio and text inputs, achieving an overall accuracy of 84%. This solution outperforms previously proposed approaches for the same dataset and modalities.
情感识别是人机交互中的一个关键任务,它使系统更加直观和响应迅速。本研究提出了一种多模态情感识别系统,该系统结合了音频和文本的低级信息,并利用卷积神经网络(CNN)和双向长短时记忆网络(BiLSTM)。所提出的系统由两个并行网络组成:一个音频块和一个文本块。通过使用BiLSTM网络和2D卷积网络处理从梅尔频率倒谱系数(MFCCs)中提取的数据,该系统能够捕捉到语音中的低级内在和外在特征。同时,结合的BiLSTM-CNN网络则会根据可用音频对应的文字嵌入来抽取文本的低级序列特性。随后,来自语音和文本的这些低级信息会被连接起来并通过全连接层进行处理,以对语音情感进行分类。 实验结果表明,所提出的EmoTech系统能够准确地从结合了音频和文本输入的情感中识别出情绪,并实现了84%的整体准确性。该解决方案在使用相同数据集和模态时优于之前提出的方法。
https://arxiv.org/abs/2501.12674
Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs dynamically summarize historical context, significantly reducing the memory footprint and computational complexity of attention mechanisms from O(n2) to O(log(n)). The model employs a single-vector, targeted attention mechanism to access stored information, and the memory block construction worker (summarizer) layer operates in two modes: a parallel execution mode during training for efficient processing of hierarchical tree structures and a sequential execution mode during inference, which acts as a memory management system. It also implicitly encodes positional information, eliminating the need for explicit positional encodings. These features make LMNs a robust and scalable solution for processing long-range sequences in resource-constrained environments, offering practical improvements in efficiency and scalability. The code is publicly available under the MIT License on GitHub: this https URL.
长序列建模是自然语言处理和时间序列分析中的一个关键方面。然而,传统的模型如循环神经网络(RNN)和变换器在处理长序列时会遇到计算效率低下和内存消耗过高的问题。本文介绍了一种新的架构——对数记忆网络(LMN),该架构利用了分层的对数树结构来高效存储和检索过去的信息。LMNs能够动态地总结历史背景,显著减少了注意力机制的记忆占用量和计算复杂度,从O(n^2)降至O(log(n))。模型采用单向量、目标注意机制来访问存储信息,并且记忆块构建工人(摘要生成器)层在两种模式下运行:一种是在训练期间用于高效处理分层树结构的并行执行模式;另一种是推理时作为内存管理系统工作的顺序执行模式。此外,LMNs还隐式地编码位置信息,从而消除了对显式位置编码的需求。这些特性使LMN成为资源受限环境中处理长序列的有效和可扩展解决方案,提供了在效率和可扩展性方面的实际改进。该代码以MIT许可证的形式公开发布在GitHub上:[此链接](请将括号内的文本替换为实际的URL)。
https://arxiv.org/abs/2501.07905
Parkinson's Disease (PD) is a degenerative neurological disorder that impairs motor and non-motor functions, significantly reducing quality of life and increasing mortality risk. Early and accurate detection of PD progression is vital for effective management and improved patient outcomes. Current diagnostic methods, however, are often costly, time-consuming, and require specialized equipment and expertise. This work proposes an innovative approach to predicting PD progression using regression methods, Long Short-Term Memory (LSTM) networks, and Kolmogorov Arnold Networks (KAN). KAN, utilizing spline-parametrized univariate functions, allows for dynamic learning of activation patterns, unlike traditional linear models. The Movement Disorder Society-Sponsored Revision of the Unified Parkinson's Disease Rating Scale (MDS-UPDRS) is a comprehensive tool for evaluating PD symptoms and is commonly used to measure disease progression. Additionally, protein or peptide abnormalities are linked to PD onset and progression. Identifying these associations can aid in predicting disease progression and understanding molecular changes. Comparing multiple models, including LSTM and KAN, this study aims to identify the method that delivers the highest metrics. The analysis reveals that KAN, with its dynamic learning capabilities, outperforms other approaches in predicting PD progression. This research highlights the potential of AI and machine learning in healthcare, paving the way for advanced computational models to enhance clinical predictions and improve patient care and treatment strategies in PD management.
帕金森病(PD)是一种退行性神经性疾病,会影响运动和非运动功能,显著降低患者的生活质量,并增加死亡风险。早期且准确地检测帕金森病的进展对于有效管理和改善患者的预后至关重要。然而,目前的诊断方法往往成本高昂、耗时长,还需要专业的设备和技术支持。本研究提出了一种使用回归方法、长短时记忆网络(LSTM)和科洛莫哥罗夫-阿诺德网络(KAN)来预测帕金森病进展的创新途径。 KAN利用分段参数化的单变量函数,能够动态学习激活模式,这与传统的线性模型不同。《运动障碍学会赞助的统一帕金森病评定量表修订版》(MDS-UPDRS)是评估帕金森症状和测量疾病进展的一个全面工具。此外,蛋白质或肽的异常变化与帕金森病的发生和发展有关联。识别这些关联有助于预测疾病的进展,并理解分子层面的变化。 本研究比较了多种模型,包括LSTM和KAN,旨在确定哪种方法能够提供最高的性能指标。分析结果显示,具有动态学习能力的KAN在预测帕金森病进展方面优于其他方法。这项研究表明人工智能和机器学习在医疗保健领域的潜力,为临床预测和改善患者护理及治疗策略提供了先进计算模型的发展方向。
https://arxiv.org/abs/2412.20744
Accurate tool wear prediction is essential for maintaining productivity and minimizing costs in machining. However, the complex nature of the tool wear process poses significant challenges to achieving reliable predictions. This study explores data-driven methods, in particular deep learning, for tool wear prediction. Traditional data-driven approaches often focus on a single process, relying on multi-sensor setups and extensive data generation, which limits generalization to new settings. Moreover, multi-sensor integration is often impractical in industrial environments. To address these limitations, this research investigates the transferability of predictive models using minimal training data, validated across two processes. Furthermore, it uses a simple setup with a single acceleration sensor to establish a low-cost data generation approach that facilitates the generalization of models to other processes via transfer learning. The study evaluates several machine learning models, including convolutional neural networks (CNN), long short-term memory networks (LSTM), support vector machines (SVM) and decision trees, trained on different input formats such as feature vectors and short-time Fourier transform (STFT). The performance of the models is evaluated on different amounts of training data, including scenarios with significantly reduced datasets, providing insight into their effectiveness under constrained data conditions. The results demonstrate the potential of specific models and configurations for effective tool wear prediction, contributing to the development of more adaptable and efficient predictive maintenance strategies in machining. Notably, the ConvNeXt model has an exceptional performance, achieving an 99.1% accuracy in identifying tool wear using data from only four milling tools operated until they are worn.
准确的刀具磨损预测对于保持生产效率和降低成本至关重要。然而,刀具磨损过程的复杂性给实现可靠预测带来了重大挑战。本研究探索了基于数据驱动的方法,尤其是深度学习技术,在刀具磨损预测中的应用。传统数据驱动方法通常集中于单一加工过程,并依赖多传感器设置及大量数据生成,这限制了其在新环境下的泛化能力。此外,多传感器整合在工业环境中往往难以实现。为解决这些局限性,本研究探讨了使用少量训练数据的预测模型可移植性的验证,并通过两个不同加工过程进行了测试。同时,该研究采用了一种简单的单加速度传感器设置来建立低成本的数据生成方法,通过迁移学习促进模型向其他加工过程泛化。 在研究中,评估了几种机器学习模型的表现,包括卷积神经网络(CNN)、长短时记忆网络(LSTM)、支持向量机(SVM)和决策树,它们分别基于不同的输入格式训练,如特征向量和短时间傅里叶变换(STFT)。通过不同规模的训练数据集对这些模型进行了评估,其中包括大量减少的数据集场景,从而展示了其在受限数据条件下的有效性。研究结果表明了某些特定模型及其配置对于有效刀具磨损预测具有潜力,并促进了更灵活高效的预测性维护策略的发展。 值得注意的是,ConvNeXt模型表现出色,在仅使用四把铣刀操作直到完全磨损产生的少量数据的情况下,达到了99.1%的准确性,成功地识别出了刀具磨损。
https://arxiv.org/abs/2412.19950
This project aims to develop a robust video surveillance system, which can segment videos into smaller clips based on the detection of activities. It uses CCTV footage, for example, to record only major events-like the appearance of a person or a thief-so that storage is optimized and digital searches are easier. It utilizes the latest techniques in object detection and tracking, including Convolutional Neural Networks (CNNs) like YOLO, SSD, and Faster R-CNN, as well as Recurrent Neural Networks (RNNs) and Long Short-Term Memory networks (LSTMs), to achieve high accuracy in detection and capture temporal dependencies. The approach incorporates adaptive background modeling through Gaussian Mixture Models (GMM) and optical flow methods like Lucas-Kanade to detect motions. Multi-scale and contextual analysis are used to improve detection across different object sizes and environments. A hybrid motion segmentation strategy combines statistical and deep learning models to manage complex movements, while optimizations for real-time processing ensure efficient computation. Tracking methods, such as Kalman Filters and Siamese networks, are employed to maintain smooth tracking even in cases of occlusion. Detection is improved on various-sized objects for multiple scenarios by multi-scale and contextual analysis. Results demonstrate high precision and recall in detecting and tracking objects, with significant improvements in processing times and accuracy due to real-time optimizations and illumination-invariant features. The impact of this research lies in its potential to transform video surveillance, reducing storage requirements and enhancing security through reliable and efficient object detection and tracking.
该项目旨在开发一个强大的视频监控系统,该系统可以根据活动检测将视频分割成较小的片段。例如,它使用CCTV录像记录重大事件,如人员或窃贼出现,从而优化存储并使数字搜索更容易。该项目采用了最新的对象检测和跟踪技术,包括卷积神经网络(CNNs)如YOLO、SSD 和 Faster R-CNN 以及递归神经网络(RNNs)和长短时记忆网络(LSTMs),以实现高精度的检测,并捕捉时间依赖关系。该方法通过高斯混合模型(GMM)和光流法如Lucas-Kanade来实现自适应背景建模,用于检测运动。多尺度和上下文分析被用来改善不同尺寸对象和环境下的检测效果。一种结合统计模型和深度学习模型的混合运动分割策略处理复杂运动,而实时处理优化确保了高效的计算性能。使用卡尔曼滤波器和Siamese网络等跟踪方法,即使在遮挡情况下也能保持平滑的追踪。多尺度和上下文分析改进了各种尺寸对象在多个场景中的检测效果。实验结果表明,在检测和跟踪物体方面实现了高精度和召回率,并且由于实时优化和光照不变特征,处理时间和准确性有了显著提升。这项研究的影响在于其潜在能力可以改变视频监控领域,减少存储需求并通过可靠、高效的对象检测与追踪来增强安全性。
https://arxiv.org/abs/2412.05331
Wastewater treatment plants face unique challenges for process control due to their complex dynamics, slow time constants, and stochastic delays in observations and actions. These characteristics make conventional control methods, such as Proportional-Integral-Derivative controllers, suboptimal for achieving efficient phosphorus removal, a critical component of wastewater treatment to ensure environmental sustainability. This study addresses these challenges using a novel deep reinforcement learning approach based on the Soft Actor-Critic algorithm, integrated with a custom simulator designed to model the delayed feedback inherent in wastewater treatment plants. The simulator incorporates Long Short-Term Memory networks for accurate multi-step state predictions, enabling realistic training scenarios. To account for the stochastic nature of delays, agents were trained under three delay scenarios: no delay, constant delay, and random delay. The results demonstrate that incorporating random delays into the reinforcement learning framework significantly improves phosphorus removal efficiency while reducing operational costs. Specifically, the delay-aware agent achieved 36% reduction in phosphorus emissions, 55% higher reward, 77% lower target deviation from the regulatory limit, and 9% lower total costs than traditional control methods in the simulated environment. These findings underscore the potential of reinforcement learning to overcome the limitations of conventional control strategies in wastewater treatment, providing an adaptive and cost-effective solution for phosphorus removal.
污水处理厂在过程控制方面面临独特的挑战,因为它们具有复杂的动态特性、缓慢的时间常数以及观测和操作中的随机延迟。这些特点使得传统控制方法(如比例-积分-微分控制器)难以实现高效的磷去除,而磷的去除是确保环境可持续性的重要组成部分。本研究采用了一种基于Soft Actor-Critic算法的新颖深度强化学习方法来应对这些挑战,并结合了一个定制模拟器设计以模型化污水处理厂固有的延迟反馈。该模拟器整合了长短时记忆网络,用于准确预测多步状态,从而实现逼真的训练场景。为了考虑延迟的随机性质,在没有延迟、固定延迟和随机延迟三种延迟情景下进行了代理训练。结果表明,将随机延迟纳入强化学习框架显著提高了磷去除效率并降低了运营成本。具体而言,在模拟环境中,延迟感知代理相比传统控制方法实现了36%的磷排放减少,55%更高的奖励,77%的目标偏离监管限值更小,以及9%的总成本降低。这些发现强调了强化学习在克服污水处理中传统控制策略局限性方面的潜力,为磷去除提供了一种适应性强且经济高效的解决方案。
https://arxiv.org/abs/2411.18305
Phishing is one of the most effective ways in which cybercriminals get sensitive details such as credentials for online banking, digital wallets, state secrets, and many more from potential victims. They do this by spamming users with malicious URLs with the sole purpose of tricking them into divulging sensitive information which is later used for various cybercrimes. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques to expose their vulnerabilities and future research direction. For better analysis and observation, we split machine learning techniques into Bayesian, non-Bayesian, and deep learning. We reviewed the most recent advances in Bayesian and non-Bayesian-based classifiers before exploiting their corresponding weaknesses to indicate future research direction. While exploiting weaknesses in both Bayesian and non-Bayesian classifiers, we also compared each performance with a deep learning classifier. For a proper review of deep learning-based classifiers, we looked at Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Long Short Term Memory Networks (LSTMs). We did an empirical analysis to evaluate the performance of each classifier along with many of the proposed state-of-the-art anti-phishing techniques to identify future research directions, we also made a series of proposals on how the performance of the under-performing algorithm can improved in addition to a two-stage prediction model
网络钓鱼是网络犯罪分子获取潜在受害者敏感信息(如在线银行、数字钱包、国家机密等的凭证)最有效的方式之一。他们通过向用户发送恶意URL来实现这一目的,其唯一目的是诱使用户泄露这些敏感信息,之后用于各种网络犯罪活动。在这项研究中,我们对当前最先进的机器学习和深度学习网络钓鱼检测技术进行了全面回顾,揭示了它们的漏洞以及未来的研究方向。为了更好地进行分析和观察,我们将机器学习技术分为贝叶斯、非贝叶斯及深度学习三类。 在本研究中,我们在探讨其对应弱点之前,首先回顾了基于贝叶斯和非贝叶斯分类器的最新进展,并利用这些弱点指出了未来的研究方向。当探索贝叶斯和非贝叶斯分类器中的弱点时,我们还将每种分类器的表现与深度学习分类器进行了比较。 为了对基于深度学习的分类器进行适当的回顾,我们关注了循环神经网络(RNN)、卷积神经网络(CNN)以及长短期记忆网络(LSTMs)。我们还进行了实证分析来评估每个分类器及许多提议的最先进的反钓鱼技术的表现,以识别未来的研究方向。此外,我们也提出了一系列如何改进表现不佳算法性能的方法,并提出了一个两阶段预测模型。
https://arxiv.org/abs/2411.16751
Firing rate models are dynamical systems widely used in applied and theoretical neuroscience to describe local cortical dynamics in neuronal populations. By providing a macroscopic perspective of neuronal activity, these models are essential for investigating oscillatory phenomena, chaotic behavior, and associative memory processes. Despite their widespread use, the application of firing rate models to associative memory networks has received limited mathematical exploration, and most existing studies are focused on specific models. Conversely, well-established associative memory designs, such as Hopfield networks, lack key biologically-relevant features intrinsic to firing rate models, including positivity and interpretable synaptic matrices that reflect excitatory and inhibitory interactions. To address this gap, we propose a general framework that ensures the emergence of re-scaled memory patterns as stable equilibria in the firing rate dynamics. Furthermore, we analyze the conditions under which the memories are locally and globally asymptotically stable, providing insights into constructing biologically-plausible and robust systems for associative memory retrieval.
发放率模型是广泛应用于应用神经科学和理论神经科学的动力系统,用于描述皮层局部的神经元群体动力学。通过提供宏观视角来观察神经活动,这些模型对于研究振荡现象、混沌行为及联想记忆过程至关重要。尽管这些模型被广泛应用,但将发放率模型应用于联想记忆网络的数学探索却相对有限,大部分现有研究集中在特定模型上。相反,一些已建立的联想记忆设计,如霍普菲尔德网络,缺乏与发放率模型固有的生物学相关特征,比如正性以及能够解释兴奋性和抑制性相互作用的可解读突触矩阵。为解决这一差距,我们提出了一种通用框架,确保重新缩放的记忆模式在发放率动态中作为稳定的平衡点出现。此外,我们分析了记忆局部和全局渐近稳定的条件,从而提供构建生物学上合理且稳健的联想记忆检索系统的见解。
https://arxiv.org/abs/2411.07388
Current cardiac cine magnetic resonance image (cMR) studies focus on the end diastole (ED) and end systole (ES) phases, while ignoring the abundant temporal information in the whole image sequence. This is because whole sequence segmentation is currently a tedious process and inaccurate. Conventional whole sequence segmentation approaches first estimate the motion field between frames, which is then used to propagate the mask along the temporal axis. However, the mask propagation results could be prone to error, especially for the basal and apex slices, where through-plane motion leads to significant morphology and structural change during the cardiac cycle. Inspired by recent advances in video object segmentation (VOS), based on spatio-temporal memory (STM) networks, we propose a continuous STM (CSTM) network for semi-supervised whole heart and whole sequence cMR segmentation. Our CSTM network takes full advantage of the spatial, scale, temporal and through-plane continuity prior of the underlying heart anatomy structures, to achieve accurate and fast 4D segmentation. Results of extensive experiments across multiple cMR datasets show that our method can improve the 4D cMR segmentation performance, especially for the hard-to-segment regions.
当前的心脏动态磁共振成像(cMR)研究主要关注舒张末期(ED)和收缩末期(ES)的相位,而忽略了整个图像序列中丰富的时序信息。这是由于目前对整个序列进行分割是一个繁琐且不准确的过程。传统的方法首先估计帧之间的运动场,然后使用该运动场沿时间轴传播掩模。然而,这种掩模传播的结果容易出错,尤其是在基底和心尖切片上,因平面外的运动会导致心脏周期中的形态和结构发生显著变化。受到基于时空记忆(STM)网络在视频对象分割(VOS)方面最新进展的启发,我们提出了一种连续时空记忆(CSTM)网络用于半监督下的整个心脏及整段序列cMR分割。我们的CSTM网络充分利用了潜在心肌解剖结构的空间、尺度、时序和平面外延续性先验知识,以实现准确且快速的4D分割。在多个cMR数据集上的广泛实验结果表明,我们提出的方法能够提高4D cMR分割性能,特别是对于难以分割的区域有显著改善。
https://arxiv.org/abs/2410.23191
Personality analysis from online short videos has gained prominence due to its applications in personalized recommendation systems, sentiment analysis, and human-computer interaction. Traditional assessment methods, such as questionnaires based on the Big Five Personality Framework, are limited by self-report biases and are impractical for large-scale or real-time analysis. Leveraging the rich, multi-modal data present in short videos offers a promising alternative for more accurate personality inference. However, integrating these diverse and asynchronous modalities poses significant challenges, particularly in aligning time-varying data and ensuring models generalize well to new domains with limited labeled data. In this paper, we propose a novel multi-modal personality analysis framework that addresses these challenges by synchronizing and integrating features from multiple modalities and enhancing model generalization through domain adaptation. We introduce a timestamp-based modality alignment mechanism that synchronizes data based on spoken word timestamps, ensuring accurate correspondence across modalities and facilitating effective feature integration. To capture temporal dependencies and inter-modal interactions, we employ Bidirectional Long Short-Term Memory networks and self-attention mechanisms, allowing the model to focus on the most informative features for personality prediction. Furthermore, we develop a gradient-based domain adaptation method that transfers knowledge from multiple source domains to improve performance in target domains with scarce labeled data. Extensive experiments on real-world datasets demonstrate that our framework significantly outperforms existing methods in personality prediction tasks, highlighting its effectiveness in capturing complex behavioral cues and robustness in adapting to new domains.
从在线短视频中进行人格分析因其在个性化推荐系统、情感分析和人机交互中的应用而变得越来越重要。传统的评估方法,如基于大五人格框架的问卷调查,受限于自我报告偏差,并且对于大规模或实时分析来说并不实际。利用短视频中存在的丰富多模态数据为更准确的人格推断提供了有希望的替代方案。然而,整合这些多样性和异步模式带来了重大挑战,特别是在对齐随时间变化的数据和确保模型在新领域中有限标注数据的情况下能够很好地泛化方面。本文提出了一种新颖的多模态人格分析框架,通过同步和集成来自多个模态的功能,并通过域适应增强模型的泛化能力来解决这些挑战。我们引入了基于时间戳的模式对齐机制,根据所说单词的时间戳同步数据,确保跨模态之间的准确对应并促进有效特征整合。为了捕捉时序依赖性和多模态交互,我们采用双向长短时记忆网络和自注意力机制,使模型能够专注于人格预测中最具信息量的特征。此外,我们开发了一种基于梯度的领域适应方法,将多个源域的知识转移到目标域以改善在标注数据稀缺的情况下的性能。在真实世界数据集上的广泛实验表明,我们的框架在人格预测任务中显著优于现有方法,突显了其捕捉复杂行为线索和适应新领域的稳健性。
https://arxiv.org/abs/2411.00813
Reproducibility in scientific research, particularly within the realm of natural language processing (NLP), is essential for validating and verifying the robustness of experimental findings. This paper delves into the reproduction and evaluation of dialogue summarization models, focusing specifically on the discrepancies observed between original studies and our reproduction efforts. Dialogue summarization is a critical aspect of NLP, aiming to condense conversational content into concise and informative summaries, thus aiding in efficient information retrieval and decision-making processes. Our research involved a thorough examination of several dialogue summarization models using the AMI (Augmented Multi-party Interaction) dataset. The models assessed include Hierarchical Memory Networks (HMNet) and various versions of Pointer-Generator Networks (PGN), namely PGN(DKE), PGN(DRD), PGN(DTS), and PGN(DALL). The primary objective was to evaluate the informativeness and quality of the summaries generated by these models through human assessment, a method that introduces subjectivity and variability in the evaluation process. The analysis began with Dataset 1, where the sample standard deviation of 0.656 indicated a moderate dispersion of data points around the mean.
https://arxiv.org/abs/2410.15962
We developed Long Short-Term Memory (LSTM) models to predict the formation of active regions (ARs) on the solar surface. Using the Doppler shift velocity, the continuum intensity, and the magnetic field observations from the Solar Dynamics Observatory (SDO) Helioseismic and Magnetic Imager (HMI), we have created time-series datasets of acoustic power and magnetic flux, which are used to train LSTM models on predicting continuum intensity, 12 hours in advance. These novel machine learning (ML) models are able to capture variations of the acoustic power density associated with upcoming magnetic flux emergence and continuum intensity decrease. Testing of the models' performance was done on data for 5 ARs, unseen from the models during training. Model 8, the best performing model trained, was able to make a successful prediction of emergence for all testing active regions in an experimental setting and three of them in an operational. The model predicted the emergence of AR11726, AR13165, and AR13179 respectively 10, 29, and 5 hours in advance, and variations of this model achieved average RMSE values of 0.11 for both active and quiet areas on the solar disc. This work sets the foundations for ML-aided prediction of solar ARs.
我们开发了 Long Short-Term Memory (LSTM) 模型,用于预测太阳表面的 active regions(ARs)的形成。通过使用 Solar Dynamics Observatory (SDO) 的 Helioseismic and Magnetic Imager (HMI) 的多普勒位移速度、连续强度和磁场观测数据,我们创建了音频功率和磁通量的时间序列数据集,这些数据被用于在预测 12 小时前的连续强度。这些新颖的机器学习(ML)模型能够捕捉到即将出现的磁通量爆发和连续强度降低与音频功率密度相关的变化。对模型性能的测试在训练数据之外的数据上进行。表现最好的模型 8 能够在一个实验设置中成功预测所有测试的 active regions 的爆发,而在操作中则预测了三个 active regions 的爆发。该模型预测 AR11726、AR13165 和 AR13179 分别提前 10、29 和 5 小时,该模型的平均 RMSE 值分别为 0.11,对于太阳盘上的活跃和静止区域。这项工作为使用 ML 辅助预测 solar ARs 奠定了基础。
https://arxiv.org/abs/2409.17421
Content-addressable memories such as Modern Hopfield Networks (MHN) have been studied as mathematical models of auto-association and storage/retrieval in the human declarative memory, yet their practical use for large-scale content storage faces challenges. Chief among them is the occurrence of meta-stable states, particularly when handling large amounts of high dimensional content. This paper introduces Hopfield Encoding Networks (HEN), a framework that integrates encoded neural representations into MHNs to improve pattern separability and reduce meta-stable states. We show that HEN can also be used for retrieval in the context of hetero association of images with natural language queries, thus removing the limitation of requiring access to partial content in the same domain. Experimental results demonstrate substantial reduction in meta-stable states and increased storage capacity while still enabling perfect recall of a significantly larger number of inputs advancing the practical utility of associative memory networks for real-world tasks.
内容可寻址的记忆,如现代联想记忆(MHN)已被作为人脑声明性记忆的自动关联和存储/检索数学模型进行研究。然而,在大规模内容存储方面,它们的实际应用面临着挑战。其中最突出的挑战是出现元稳定状态,特别是在处理大量高维内容时。本文介绍了一个名为Hopfield编码网络(HEN)的框架,将编码的神经表示整合到MHN中,以提高模式分离度和减少元稳定状态。我们证明了HEN也可以用于图像与自然语言查询的异质关联检索,从而消除了在同一领域内需要访问部分内容的限制。实验结果表明,在保持记忆容量的同时,元稳定状态的数量大幅减少,存储容量也得到了增加。这将有助于实现为现实世界任务提供更大实用价值的联想记忆网络。
https://arxiv.org/abs/2409.16408
Tracing a student's knowledge growth given the past exercise answering is a vital objective in automatic tutoring systems to customize the learning experience. Yet, achieving this objective is a non-trivial task as it involves modeling the knowledge state across multiple knowledge components (KCs) while considering their temporal and relational dynamics during the learning process. Knowledge tracing methods have tackled this task by either modeling KCs' temporal dynamics using recurrent models or relational dynamics across KCs and questions using graph models. Albeit, there is a lack of methods that could learn joint embedding between relational and temporal dynamics of the task. Moreover, many methods that count for the impact of a student's forgetting behavior during the learning process use hand-crafted features, limiting their generalization on different scenarios. In this paper, we propose a novel method that jointly models the relational and temporal dynamics of the knowledge state using a deep temporal graph memory network. In addition, we propose a generic technique for representing a student's forgetting behavior using temporal decay constraints on the graph memory module. We demonstrate the effectiveness of our proposed method using multiple knowledge tracing benchmarks while comparing it to state-of-the-art methods.
在自动教学系统中,追踪学生的知识增长是一个至关重要的目标,以便定制学习体验。然而,实现这个目标并不是一个轻松的任务,因为它涉及到在学习过程中建模多个知识组件(KCs)之间的知识状态的时空和关系动态。知识跟踪方法通过使用循环模型或图模型来建模KCs的时空动态或KCs和问题之间的关系动态来解决这个任务。尽管如此,尚缺乏学习任务之间联合嵌入的方法。此外,许多影响学生遗忘行为的因素计数方法使用手动的特征,限制了它们在不同情景下的泛化能力。在本文中,我们提出了一种新颖的方法,使用深度时间图记忆网络共同建模知识状态的相对和时间动态。此外,我们还提出了一种代表学生遗忘行为的图形记忆模块上的时间衰减约束的通用技术。我们通过使用多个知识跟踪基准来证明我们提出的方法的有效性,并将其与最先进的方法进行比较。
https://arxiv.org/abs/2410.01836
The field of artificial intelligence faces significant challenges in achieving both biological plausibility and computational efficiency, particularly in visual learning tasks. Current artificial neural networks, such as convolutional neural networks, rely on techniques like backpropagation and weight sharing, which do not align with the brain's natural information processing methods. To address these issues, we propose the Memory Network, a model inspired by biological principles that avoids backpropagation and convolutions, and operates in a single pass. This approach enables rapid and efficient learning, mimicking the brain's ability to adapt quickly with minimal exposure to data. Our experiments demonstrate that the Memory Network achieves efficient and biologically plausible learning, showing strong performance on simpler datasets like MNIST. However, further refinement is needed for the model to handle more complex datasets such as CIFAR10, highlighting the need to develop new algorithms and techniques that closely align with biological processes while maintaining computational efficiency.
人工智能领域在实现生物可信度和计算效率方面面临着重大挑战,特别是在视觉学习任务中。当前的人工神经网络,如卷积神经网络,依赖于诸如反向传播和权重共享等技术,这些技术与大脑的自然信息处理方法不协调。为解决这些问题,我们提出了Memory Network,一种受生物原理启发的设计,避免了反向传播和卷积操作,且在单次通过过程中操作。这种方法能够实现快速和高效的训练,模仿大脑在 minimal exposure to data 情况下快速适应的能力。我们的实验结果表明,Memory Network在简单数据集上实现了高效的生物可信的学习,在MNIST等数据集上表现出色。然而,为了使模型能够处理更复杂的数据集,如CIFAR10,还需要进一步优化,强调在保持计算效率的同时,需要开发新的算法和技术,紧密 align with biological processes。
https://arxiv.org/abs/2409.17282
Electroencephalography (EEG) signals are crucial for investigating brain function and cognitive processes. This study aims to address the challenges of efficiently recording and analyzing high-dimensional EEG signals while listening to music to recognize emotional states. We propose a method combining Bidirectional Long Short-Term Memory (Bi-LSTM) networks with attention mechanisms for EEG signal processing. Using wearable EEG devices, we collected brain activity data from participants listening to music. The data was preprocessed, segmented, and Differential Entropy (DE) features were extracted. We then constructed and trained a Bi-LSTM model to enhance key feature extraction and improve emotion recognition accuracy. Experiments were conducted on the SEED and DEAP datasets. The Bi-LSTM-AttGW model achieved 98.28% accuracy on the SEED dataset and 92.46% on the DEAP dataset in multi-class emotion recognition tasks, significantly outperforming traditional models such as SVM and EEG-Net. This study demonstrates the effectiveness of combining Bi-LSTM with attention mechanisms, providing robust technical support for applications in brain-computer interfaces (BCI) and affective computing. Future work will focus on improving device design, incorporating multimodal data, and further enhancing emotion recognition accuracy, aiming to achieve practical applications in real-world scenarios.
脑电图(EEG)信号对于研究大脑功能和认知过程至关重要。本研究旨在解决在听音乐时高效记录和分析高维EEG信号的挑战,以识别情感状态。我们提出了一种结合双向长短期记忆(Bi-LSTM)网络与注意机制的EEG信号处理方法。使用可穿戴式EEG设备从参与者听音乐的活动中收集脑活动数据。数据进行预处理、分割和提取差异熵(DE)特征。然后我们构建并训练了Bi-LSTM模型,以提高关键特征提取并提高情感识别准确性。实验在SEED和DEAP数据集上进行。Bi-LSTM-AttGW模型在多类情感识别任务中SEED数据集上的准确率为98.28%,DEAP数据集上的准确率为92.46%,显著优于传统模型(如SVM和EEG-Net)。本研究证明了将Bi-LSTM与注意力机制相结合的有效性,为脑-计算机接口(BCI)和情感计算提供了稳健的技术支持。未来工作将关注设备设计的改进、多模态数据的引入以及进一步提高情感识别准确性,以实现现实场景中的实际应用。
https://arxiv.org/abs/2408.12124
In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the impact of the target information on the prediction. To address these issues, we propose a Pattern-Matching Dynamic Memory Network (PM-DMNet). PM-DMNet employs a novel dynamic memory network to capture traffic pattern features with only O(N) complexity, significantly reducing computational overhead while achieving excellent performance. The PM-DMNet also introduces two prediction methods: Recursive Multi-step Prediction (RMP) and Parallel Multi-step Prediction (PMP), which leverage the time features of the prediction targets to assist in the forecasting process. Furthermore, a transfer attention mechanism is integrated into PMP, transforming historical data features to better align with the predicted target states, thereby capturing trend changes more accurately and reducing errors. Extensive experiments demonstrate the superiority of the proposed model over existing benchmarks. The source codes are available at: this https URL.
近年来,在交通预测领域,深度学习越来越受到关注。现有的交通预测模型通常依赖于 GCNs 或具有 O(N^2) 复杂性的注意力机制来动态提取交通节点特征,这些模型缺乏效率并且不是轻量级的。此外,这些模型通常仅利用历史数据进行预测,而没有考虑目标信息对预测的影响。为了解决这些问题,我们提出了一个模式匹配动态内存网络(PM-DMNet)。PM-DMNet 采用了一种新颖的动态内存网络来捕获交通模式特征,其复杂度仅为 O(N),从而大大减少了计算开销,同时实现了卓越的性能。PM-DMNet 还引入了两种预测方法:递归多步预测(RMP)和并行多步预测(PMP),它们利用预测目标的时序特征来协助进行预测过程。此外,PMP 中还引入了转移注意力机制,将历史数据特征更好地对预测目标的状态进行对齐,从而更准确地捕捉趋势变化并减少错误。大量实验证明,与现有基准相比,所提出的模型具有优越性。源代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2408.07100