Implicit biases in both humans and large language models (LLMs) pose significant societal risks. Dual process theories propose that biases arise primarily from associative System 1 thinking, while deliberative System 2 thinking mitigates bias, but the cognitive mechanisms that give rise to this phenomenon remain poorly understood. To better understand what underlies this duality in humans, and possibly in LLMs, we model System 1 and System 2 thinking as semantic memory networks with distinct structures, built from comparable datasets generated by both humans and LLMs. We then investigate how these distinct semantic memory structures relate to implicit gender bias using network-based evaluation metrics. We find that semantic memory structures are irreducible only in humans, suggesting that LLMs lack certain types of human-like conceptual knowledge. Moreover, semantic memory structure relates consistently to implicit bias only in humans, with lower levels of bias in System~2 structures. These findings suggest that certain types of conceptual knowledge contribute to bias regulation in humans, but not in LLMs, highlighting fundamental differences between human and machine cognition.
https://arxiv.org/abs/2604.12816
In this article, we present a gold-standard benchmark dataset for Biomedical Urdu Named Entity Recognition (BioUNER), developed by crawling health-related articles from online Urdu news portals, medical prescriptions, and hospital health blogs and websites. After preprocessing, three native annotators with familiarity in the medical domain participated in the annotation process using the Doccano text annotation tool and annotated 153K tokens. Following annotation, the proposed BioiUNER dataset was evaluated both intrinsically and extrinsically. An inter-annotator agreement score of 0.78 was achieved, thereby validating the dataset as gold-standard quality. To demonstrate the utility and benchmarking capability of the dataset, we evaluated several machine learning and deep learning models, including Support Vector Machines (SVM), Long Short-Term Memory networks (LSTM), Multilingual BERT (mBERT), and XLM-RoBERTa. The gold-standard BioUNER dataset serves as a reliable benchmark and a valuable addition to Urdu language processing resources.
https://arxiv.org/abs/2604.02904
Recent vision and multimodal foundation backbones, such as Transformer families and state-space models like Mamba, have achieved remarkable progress, enabling unified modeling across images, text, and beyond. Despite their empirical success, these architectures remain far from the computational principles of the human brain, often demanding enormous amounts of training data while offering limited interpretability. In this work, we propose the Vision Hopfield Memory Network (V-HMN), a brain-inspired foundation backbone that integrates hierarchical memory mechanisms with iterative refinement updates. Specifically, V-HMN incorporates local Hopfield modules that provide associative memory dynamics at the image patch level, global Hopfield modules that function as episodic memory for contextual modulation, and a predictive-coding-inspired refinement rule for iterative error correction. By organizing these memory-based modules hierarchically, V-HMN captures both local and global dynamics in a unified framework. Memory retrieval exposes the relationship between inputs and stored patterns, making decisions more interpretable, while the reuse of stored patterns improves data efficiency. This brain-inspired design therefore enhances interpretability and data efficiency beyond existing self-attention- or state-space-based approaches. We conducted extensive experiments on public computer vision benchmarks, and V-HMN achieved competitive results against widely adopted backbone architectures, while offering better interpretability, higher data efficiency, and stronger biological plausibility. These findings highlight the potential of V-HMN to serve as a next-generation vision foundation model, while also providing a generalizable blueprint for multimodal backbones in domains such as text and audio, thereby bridging brain-inspired computation with large-scale machine learning.
近期视觉与多模态基础主干网络,如Transformer系列及Mamba等状态空间模型,已取得显著进展,实现了图像、文本等多模态的统一建模。尽管这些架构在经验上取得成功,但其设计原理仍与人类大脑的计算机制相去甚远,往往需要海量训练数据,且可解释性有限。本研究提出脑启发式基础主干网络——视觉霍普菲尔德记忆网络(V-HMN),该网络将分层记忆机制与迭代细化更新相结合。具体而言,V-HMN集成了局部霍普菲尔德模块(在图像块层面提供联想记忆动态)、全局霍普菲尔德模块(作为情景记忆实现上下文调制),以及受预测编码启发的迭代纠错细化规则。通过将这些基于记忆的模块进行分层组织,V-HMN在统一框架内同时捕捉局部与全局动态。记忆检索过程揭示了输入与存储模式之间的关系,使决策更具可解释性;而存储模式的重用则提升了数据效率。因此,这种脑启发式设计在可解释性与数据效率方面超越了现有的自注意力或状态空间方法。我们在公开计算机视觉基准上进行了大量实验,结果表明V-HMN在性能上可与广泛采用的主干架构相竞争,同时提供了更好的可解释性、更高的数据效率以及更强的生物学合理性。这些发现凸显了V-HMN作为下一代视觉基础模型的潜力,并为文本、音频等领域的多模态主干网络提供了可推广的通用框架,从而在脑启发式计算与大规模机器学习之间架起桥梁。
https://arxiv.org/abs/2603.25157
Integral Field Spectroscopy (IFS) surveys offer a unique new landscape in which to learn in both spatial and spectroscopic dimensions and could help uncover previously unknown insights into galaxy evolution. In this work, we demonstrate a new unsupervised deep learning framework using Convolutional Long-Short Term Memory Network Autoencoders to encode generalized feature representations across both spatial and spectroscopic dimensions spanning $19$ optical emission lines (3800A $< \lambda <$ 8000A) among a sample of $\sim 9000$ galaxies from the MaNGA IFS survey. As a demonstrative exercise, we assess our model on a sample of $290$ Active Galactic Nuclei (AGN) and highlight scientifically interesting characteristics of some highly anomalous AGN.
积分场光谱学(IFS)调查提供了一个独特的新领域,可以同时在空间和分光维度上进行学习,并且可能有助于揭示关于星系演化之前未知的见解。在这项工作中,我们展示了一种新的无监督深度学习框架,使用卷积长短时记忆网络自编码器来对马尼亚(MaNGA)IFS调查中约9000个星系样本中的19条光学发射线(3800Å < λ < 8000Å)的空间和光谱维度进行泛化特征表示的编码。作为一种演示练习,我们在290个活动星系核(AGN)的样本上评估了我们的模型,并突出了某些高度异常AGN的一些科学兴趣特性。
https://arxiv.org/abs/2602.18426
Skeletal muscle-based biohybrid actuators have proved to be a promising component in soft robotics, offering efficient movement. However, their intrinsic biological variability and nonlinearity pose significant challenges for controllability and predictability. To address these issues, this study investigates the application of supervised learning, a form of machine learning, to model and predict the behavior of biohybrid machines (BHMs), focusing on a muscle ring anchored on flexible polymer pillars. First, static prediction models (i.e., random forest and neural network regressors) are trained to estimate the maximum exerted force achieved from input variables such as muscle sample, electrical stimulation parameters, and baseline exerted force. Second, a dynamic modeling framework, based on Long Short-Term Memory networks, is developed to serve as a digital twin, replicating the time series of exerted forces observed in response to electrical stimulation. Both modeling approaches demonstrate high predictive accuracy. The best performance of the static models is characterized by R2 of 0.9425, whereas the dynamic model achieves R2 of 0.9956. The static models can enable optimization of muscle actuator performance for targeted applications and required force outcomes, while the dynamic model provides a foundation for developing robustly adaptive control strategies in future biohybrid robotic systems.
https://arxiv.org/abs/2602.16330
Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient hybrid architecture that integrates Sliding Window Attention (SWA) with non-linear Test-Time Training (TTT) memory networks. \textsc{AllMem} enables models to effectively scale to ultra-long contexts while mitigating catastrophic forgetting. This approach not only overcomes the representation constraints typical of linear memory models but also significantly reduces the computational and memory footprint during long-sequence inference. Furthermore, we implement a Memory-Efficient Fine-Tuning strategy to replace standard attention layers in pre-trained models with memory-augmented sliding window layers. This framework facilitates the efficient transformation of any off-the-shelf pre-trained LLM into an \textsc{AllMem}-based architecture. Empirical evaluations confirm that our 4k window model achieves near-lossless performance on 37k LongBench with a marginal 0.83 drop compared to full attention. Furthermore, on InfiniteBench at a 128k context, our 8k window variant outperforms full attention, which validates the effectiveness of our parameterized memory in mitigating noise and maintaining robust long-range modeling without the prohibitive costs of global attention.
https://arxiv.org/abs/2602.13680
Accurate segmentation of cardiac chambers in echocardiography sequences is crucial for the quantitative analysis of cardiac function, aiding in clinical diagnosis and treatment. The imaging noise, artifacts, and the deformation and motion of the heart pose challenges to segmentation algorithms. While existing methods based on convolutional neural networks, Transformers, and space-time memory networks have improved segmentation accuracy, they often struggle with the trade-off between capturing long-range spatiotemporal dependencies and maintaining computational efficiency with fine-grained feature representation. In this paper, we introduce GDKVM, a novel architecture for echocardiography video segmentation. The model employs Linear Key-Value Association (LKVA) to effectively model inter-frame correlations, and introduces Gated Delta Rule (GDR) to efficiently store intermediate memory states. Key-Pixel Feature Fusion (KPFF) module is designed to integrate local and global features at multiple scales, enhancing robustness against boundary blurring and noise interference. We validated GDKVM on two mainstream echocardiography video datasets (CAMUS and EchoNet-Dynamic) and compared it with various state-of-the-art methods. Experimental results show that GDKVM outperforms existing approaches in terms of segmentation accuracy and robustness, while ensuring real-time performance. Code is available at this https URL.
在超声心动图序列中精确分割心脏腔室对于定量分析心脏功能至关重要,有助于临床诊断和治疗。然而,成像噪声、伪影以及心脏变形和运动等因素给分割算法带来了挑战。尽管基于卷积神经网络(CNN)、变压器和时空记忆网络的现有方法提高了分割精度,但它们在捕捉长程空间时间依赖性与保持计算效率和细粒度特征表示之间的权衡上仍然面临困难。 本文介绍了GDKVM,这是一种新颖的超声心动图视频分割架构。该模型采用了线性键值关联(LKVA)来有效建模帧间相关性,并引入了门控增量规则(GDR)以高效存储中间记忆状态。设计了关键像素特征融合(KPFF)模块,旨在在多个尺度上整合局部和全局特征,增强其对边界模糊和噪声干扰的鲁棒性。 我们在两个主流的超声心动图视频数据集(CAMUS 和 EchoNet-Dynamic)上验证了GDKVM,并将其与多种最先进的方法进行了比较。实验结果表明,GDKVM在分割精度和鲁棒性方面超过了现有方法,同时确保了实时性能。代码可在[此处](https://此链接应替换为实际的URL或描述,原文中未提供具体网址)获取。
https://arxiv.org/abs/2512.10252
Accurate Remaining Useful Life (RUL) prediction coupled with uncertainty quantification remains a critical challenge in aerospace prognostics. This research introduces a novel uncertainty-aware deep learning framework that learns aleatoric uncertainty directly through probabilistic modeling, an approach unexplored in existing CMAPSS-based literature. Our hierarchical architecture integrates multi-scale Inception blocks for temporal pattern extraction, bidirectional Long Short-Term Memory networks for sequential modeling, and a dual-level attention mechanism operating simultaneously on sensor and temporal dimensions. The innovation lies in the Bayesian output layer that predicts both mean RUL and variance, enabling the model to learn data-inherent uncertainty. Comprehensive preprocessing employs condition-aware clustering, wavelet denoising, and intelligent feature selection. Experimental validation on NASA CMAPSS benchmarks (FD001-FD004) demonstrates competitive overall performance with RMSE values of 16.22, 19.29, 16.84, and 19.98 respectively. Remarkably, our framework achieves breakthrough critical zone performance (RUL <= 30 cycles) with RMSE of 5.14, 6.89, 5.27, and 7.16, representing 25-40 percent improvements over conventional approaches and establishing new benchmarks for safety-critical predictions. The learned uncertainty provides well-calibrated 95 percent confidence intervals with coverage ranging from 93.5 percent to 95.2 percent, enabling risk-aware maintenance scheduling previously unattainable in CMAPSS literature.
https://arxiv.org/abs/2511.19124
This paper compares Kolmogorov-Arnold Networks (KAN) and Long Short-Term Memory networks (LSTM) for forecasting non-deterministic stock price data, evaluating predictive accuracy versus interpretability trade-offs using Root Mean Square Error (RMSE).LSTM demonstrates substantial superiority across all tested prediction horizons, confirming their established effectiveness for sequential data modelling. Standard KAN, while offering theoretical interpretability through the Kolmogorov-Arnold representation theorem, exhibits significantly higher error rates and limited practical applicability for time series forecasting. The results confirm LSTM dominance in accuracy-critical time series applications while identifying computational efficiency as KANs' primary advantage in resource-constrained scenarios where accuracy requirements are less stringent. The findings support LSTM adoption for practical financial forecasting while suggesting that continued research into specialised KAN architectures may yield future improvements.
https://arxiv.org/abs/2511.18613
We study a systematic approach to a popular Statistical Arbitrage technique: Pairs Trading. Instead of relying on two highly correlated assets, we replace the second asset with a replication of the first using risk factor representations. These factors are obtained through Principal Components Analysis (PCA), exchange traded funds (ETFs), and, as our main contribution, Long Short Term Memory networks (LSTMs). Residuals between the main asset and its replication are examined for mean reversion properties, and trading signals are generated for sufficiently fast mean reverting portfolios. Beyond introducing a deep learning based replication method, we adapt the framework of Avellaneda and Lee (2008) to the Polish market. Accordingly, components of WIG20, mWIG40, and selected sector indices replace the original S&P500 universe, and market parameters such as the risk free rate and transaction costs are updated to reflect local conditions. We outline the full strategy pipeline: risk factor construction, residual modeling via the Ornstein Uhlenbeck process, and signal generation. Each replication technique is described together with its practical implementation. Strategy performance is evaluated over two periods: 2017-2019 and the recessive year 2020. All methods yield profits in 2017-2019, with PCA achieving roughly 20 percent cumulative return and an annualized Sharpe ratio of up to 2.63. Despite multiple adaptations, our conclusions remain consistent with those of the original paper. During the COVID-19 recession, only the ETF based approach remains profitable (about 5 percent annual return), while PCA and LSTM methods underperform. LSTM results, although negative, are promising and indicate potential for future optimization.
我们研究了一种流行统计套利技术——配对交易的系统方法。不同于依赖于两个高度相关的资产,我们将第二个资产替换为使用风险因子表示法复制第一个资产。这些因素是通过主成分分析(PCA)、交易所交易基金(ETFs)以及我们的主要贡献——长期短期记忆网络(LSTMs)获得的。我们检查主要资产与其复制版本之间的残差是否具有均值回归特性,并为此类足够快速地呈现均值回归特性的投资组合生成交易信号。 除了介绍一种基于深度学习的复制方法外,我们将 Avellaneda 和 Lee (2008) 的框架适应于波兰市场。相应地,WIG20、mWIG40 以及选定的行业指数替换了原始的 S&P500 投资组合,并且风险无风险利率和交易成本等市场参数更新以反映当地条件。我们概述了整个策略流程:风险管理因素构建、通过奥恩斯坦-乌伦贝克过程对残差进行建模,及信号生成。每种复制技术及其实际应用都得到了描述。 策略性能在两个时期进行了评估:2017年至2019年和经济衰退期的2020年。所有方法均在 2017-2019 年期间获利,其中 PCA 实现了大约 20% 的累积回报,并且年度化夏普比率达到 2.63 左右。尽管进行了多次调整,我们的结论仍与原论文保持一致。然而,在新冠疫情期间,只有基于 ETF 方法仍然有利可图(年回报率约为5%),而 PCA 和 LSTM 方法表现不佳。虽然 LSTM 的结果是负的,但显示了未来优化的巨大潜力。
https://arxiv.org/abs/2512.02037
Reliable hydrologic and flood forecasting requires models that remain stable when input data are delayed, missing, or inconsistent. However, most advances in rainfall-runoff prediction have been evaluated under ideal data conditions, emphasizing accuracy rather than operational resilience. Here, we develop an operationally ready emulator of the Global Flood Awareness System (GloFAS) that couples long- and short-term memory networks with a relaxed water-balance constraint to preserve physical coherence. Five architectures span a continuum of information availability: from complete historical and forecast forcings to scenarios with data latency and outages, allowing systematic evaluation of robustness. Trained in minimally managed catchments across the United States and tested in more than 5,000 basins, including heavily regulated rivers in India, the emulator reproduces the hydrological core of GloFAS and degrades smoothly as information quality declines. Transfer across contrasting hydroclimatic and management regimes yields reduced yet physically consistent performance, defining the limits of generalization under data scarcity and human influence. The framework establishes operational robustness as a measurable property of hydrological machine learning and advances the design of reliable real-time forecasting systems.
可靠的水文和洪水预报需要在输入数据延迟、缺失或不一致的情况下仍能保持稳定的模型。然而,大多数降雨径流预测的进步是在理想的数据条件下评估的,注重准确性而非操作稳定性。在此,我们开发了一个可以立即投入使用的全球洪水预警系统(GloFAS)模拟器,它将长短期记忆网络与松弛的水量平衡约束相结合以维持物理连贯性。五种架构涵盖了信息可用性的连续范围:从完整的历史和预测强迫条件到数据延迟和中断的情况,从而能够系统地评估其稳健性。该模拟器在美国最小管理流域中进行了训练,并在包括印度受严格管控河流在内的超过5000个盆地中进行了测试。模拟器再现了GloFAS的水文核心内容,并随着信息质量下降而平滑退化。在不同的气候和管理制度下进行迁移学习,虽然性能有所降低但仍保持物理一致性,定义了数据稀缺及人为影响下的泛化极限。该框架将操作稳定性确立为水文学机器学习的一个可测量属性,并推进了可靠实时预报系统的设计。
https://arxiv.org/abs/2510.18535
We present a reproducibility study of the state-of-the-art neural architecture for sequence labeling proposed by Ma and Hovy (2016)\cite{ma2016end}. The original BiLSTM-CNN-CRF model combines character-level representations via Convolutional Neural Networks (CNNs), word-level context modeling through Bi-directional Long Short-Term Memory networks (BiLSTMs), and structured prediction using Conditional Random Fields (CRFs). This end-to-end approach eliminates the need for hand-crafted features while achieving excellent performance on named entity recognition (NER) and part-of-speech (POS) tagging tasks. Our implementation successfully reproduces the key results, achieving 91.18\% F1-score on CoNLL-2003 NER and demonstrating the model's effectiveness across sequence labeling tasks. We provide a detailed analysis of the architecture components and release an open-source PyTorch implementation to facilitate further research.
我们对Ma和Hovy(2016)\cite{ma2016end}提出的最新序列标注神经架构进行了可重复性研究。原BiLSTM-CNN-CRF模型结合了通过卷积神经网络(CNNs)产生的字符级表示,双向长短期记忆网络(BiLSTMs)进行的词级别上下文建模以及使用条件随机场(CRFs)实现的结构化预测。这种端到端的方法消除了手工特征设计的需求,并在命名实体识别(NER)和词性标注(POS)任务上取得了优异的成绩。 我们的实现在关键结果方面成功地再现了原始研究,具体而言,在CoNLL-2003 NER数据集上实现了91.18%的F1分数,展示了该模型在序列标注任务中的有效性。我们提供了对架构组件的详细分析,并发布了一个开源的PyTorch实现,以促进进一步的研究。
https://arxiv.org/abs/2510.10936
Social media has become an essential part of the digital age, serving as a platform for communication, interaction, and information sharing. Celebrities are among the most active users and often reveal aspects of their personal and professional lives through online posts. Platforms such as Twitter provide an opportunity to analyze language and behavior for understanding demographic and social patterns. Since followers frequently share linguistic traits and interests with the celebrities they follow, textual data from followers can be used to predict celebrity demographics. However, most existing research in this field has focused on English and other high-resource languages, leaving Urdu largely unexplored. This study applies modern machine learning and deep learning techniques to the problem of celebrity profiling in Urdu. A dataset of short Urdu tweets from followers of subcontinent celebrities was collected and preprocessed. Multiple algorithms were trained and compared, including Logistic Regression, Support Vector Machines, Random Forests, Convolutional Neural Networks, and Long Short-Term Memory networks. The models were evaluated using accuracy, precision, recall, F1-score, and cumulative rank (cRank). The best performance was achieved for gender prediction with a cRank of 0.65 and an accuracy of 0.65, followed by moderate results for age, profession, and fame prediction. These results demonstrate that follower-based linguistic features can be effectively leveraged using machine learning and neural approaches for demographic prediction in Urdu, a low-resource language.
社交媒体已成为数字时代不可或缺的一部分,它作为沟通、互动和信息分享的平台发挥着重要作用。名人是其中最活跃的一类用户群体,他们经常通过在线帖子展示个人生活和职业生活的各个方面。例如,Twitter这样的平台为分析语言和行为提供了机会,以理解人口和社会模式。由于追随者通常与他们关注的名人的语言特征和兴趣有相似之处,因此可以从这些追随者的文本数据中预测名人的特征。 然而,在这一领域的大多数现有研究主要集中在英语和其他资源丰富的语言上,乌尔都语则相对较少被探索。本研究将现代机器学习和深度学习技术应用于乌尔都语的名人画像问题。我们收集并预处理了一个由次大陆名人的追随者发布的短微博数据集。训练了包括逻辑回归、支持向量机、随机森林、卷积神经网络和长短期记忆网络在内的多种算法,并通过准确率、精确度、召回率、F1分数以及累积排名(cRank)对模型进行了评估。 性别预测方面取得了最佳性能,其cRank为0.65,准确率为0.65;年龄、职业和知名度预测则表现较为一般。这些结果表明,在乌尔都语这一资源较少的语言中,可以通过机器学习和神经网络方法有效地利用追随者的语言特征来进行人口统计学预测。
https://arxiv.org/abs/2510.11739
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
这篇评论旨在对液态神经网络(LNN)和传统的循环神经网络(RNN)及其变体,如长短期记忆网络(LSTM)和门控循环单元(GRU),进行比较分析。该研究的核心维度包括模型准确性、内存效率以及泛化能力。通过系统回顾现有的研究成果,本文探讨了这些神经网络架构在处理序列数据时的基本原理、数学模型、关键特性及其内在挑战。研究表明,作为新兴的生物启发式连续时间动态神经网络,LNN在处理噪声和非平稳数据方面表现出显著潜力,并且能够实现出域(OOD)泛化。此外,一些LNN变体在参数效率和计算速度上优于传统RNN。然而,由于其成熟的生态系统以及跨各种任务的成功应用,RNN仍然是序列建模的基石。这篇评论确定了LNN与RNN之间的共性和差异,总结了它们各自的缺点和挑战,并指出未来研究的重要方向,尤其强调提高LNN可扩展性的重要性,以促进其在更广泛和复杂的场景中的应用。
https://arxiv.org/abs/2510.07578
Solar Proton Events (SPEs) cause significant radiation hazards to satellites, astronauts, and technological systems. Accurate forecasting of their proton flux time profiles is crucial for early warnings and mitigation. This paper explores deep learning sequence-to-sequence (seq2seq) models based on Long Short-Term Memory networks to predict 24-hour proton flux profiles following SPE onsets. We used a dataset of 40 well-connected SPEs (1997-2017) observed by NOAA GOES, each associated with a >=M-class western-hemisphere solar flare and undisturbed proton flux profiles. Using 4-fold stratified cross-validation, we evaluate seq2seq model configurations (varying hidden units and embedding dimensions) under multiple forecasting scenarios: (i) proton-only input vs. combined proton+X-ray input, (ii) original flux data vs. trend-smoothed data, and (iii) autoregressive vs. one-shot forecasting. Our major results are as follows: First, one-shot forecasting consistently yields lower error than autoregressive prediction, avoiding the error accumulation seen in iterative approaches. Second, on the original data, proton-only models outperform proton+X-ray models. However, with trend-smoothed data, this gap narrows or reverses in proton+X-ray models. Third, trend-smoothing significantly enhances the performance of proton+X-ray models by mitigating fluctuations in the X-ray channel. Fourth, while models trained on trendsmoothed data perform best on average, the best-performing model was trained on original data, suggesting that architectural choices can sometimes outweigh the benefits of data preprocessing.
太阳质子事件(SPEs)对卫星、宇航员和技术系统造成显著的辐射危害。准确预测其质子通量的时间曲线对于早期预警和缓解措施至关重要。本文探讨了基于长短期记忆网络的深度学习序列到序列(seq2seq)模型,以预测在SPE发生后的24小时内的质子通量曲线。我们使用了一套由NOAA GOES观测的40个相互关联良好的SPE(1997-2017年间)的数据集,每个事件都与西部半球的大于M级太阳耀斑以及未受干扰的质子通量曲线相关联。 通过4折分层交叉验证,我们在多种预测情景下评估了seq2seq模型配置的不同参数设置(隐藏单元和嵌入维度变化),包括:(i) 单独使用质子输入与同时使用质子和X射线输入;(ii) 使用原始通量数据与经过趋势平滑处理的数据;以及 (iii) 自回归预测与一次性预测。我们的主要研究结果如下: 1. 一次性预测方法始终比自回归预测产生更低的误差,避免了迭代方法中的误差累积。 2. 在使用原始数据时,单独使用质子输入的模型优于同时使用质子和X射线输入的模型。然而,在使用趋势平滑后的数据时,后者的表现与前者差距缩小甚至超越前者。 3. 趋势平滑显著提升了质子加X射线模型的效果,减少了X射线通道中的波动性。 4. 尽管基于经过趋势平滑处理的数据训练出的模型平均性能最佳,但表现最好的模型是使用原始数据进行训练的结果。这表明架构选择有时可以克服预处理带来的好处。 这些发现对于提高太阳质子事件预测精度具有重要的实际意义,并且能够为未来的相关研究提供有价值的参考信息。
https://arxiv.org/abs/2510.05399
This study applies a range of forecasting techniques,including ARIMA, Prophet, Long Short Term Memory networks (LSTM), Temporal Convolutional Networks (TCN), and XGBoost, to model and predict Russian equipment losses during the ongoing war in Ukraine. Drawing on daily and monthly open-source intelligence (OSINT) data from WarSpotting, we aim to assess trends in attrition, evaluate model performance, and estimate future loss patterns through the end of 2025. Our findings show that deep learning models, particularly TCN and LSTM, produce stable and consistent forecasts, especially under conditions of high temporal granularity. By comparing different model architectures and input structures, this study highlights the importance of ensemble forecasting in conflict modeling, and the value of publicly available OSINT data in quantifying material degradation over time.
这项研究应用了一系列预测技术,包括ARIMA、Prophet、长短期记忆网络(LSTM)、时间卷积网络(TCN)和XGBoost,来建模并预测俄罗斯在乌克兰战争中装备的损失。我们利用WarSpotting提供的每日和月度开源情报(OSINT)数据,旨在评估损耗趋势、评估模型性能,并估计到2025年底的未来损失模式。我们的研究发现表明,深度学习模型,尤其是TCN和LSTM,在时间粒度高的条件下能产生稳定且一致的预测结果。通过比较不同模型架构和输入结构,本研究强调了冲突建模中的集成预测的重要性以及公开可用OSINT数据在量化材料损耗方面的时间价值。
https://arxiv.org/abs/2509.07813
Speech Emotion Recognition (SER) presents a significant yet persistent challenge in human-computer interaction. While deep learning has advanced spoken language processing, achieving high performance on limited datasets remains a critical hurdle. This paper confronts this issue by developing and evaluating a suite of machine learning models, including Support Vector Machines (SVMs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs), for automated emotion classification in human speech. We demonstrate that by strategically employing transfer learning and innovative data augmentation techniques, our models can achieve impressive performance despite the constraints of a relatively small dataset. Our most effective model, a ResNet34 architecture, establishes a new performance benchmark on the combined RAVDESS and SAVEE datasets, attaining an accuracy of 66.7% and an F1 score of 0.631. These results underscore the substantial benefits of leveraging pre-trained models and data augmentation to overcome data scarcity, thereby paving the way for more robust and generalizable SER systems.
语音情感识别(SER)在人机交互中面临着一个重大且持久的挑战。虽然深度学习已经推动了口语处理技术的进步,但在有限的数据集上实现高性能仍然是一个重要障碍。本文通过开发和评估一系列机器学习模型来应对这一问题,这些模型包括支持向量机(SVM)、长短时记忆网络(LSTM)和卷积神经网络(CNN),以进行人类语音情感的自动分类。我们证明,通过战略性地采用迁移学习和创新的数据增强技术,我们的模型能够在数据集相对较小的情况下实现令人印象深刻的性能表现。 最有效的模型是我们采用的ResNet34架构,在结合RAVDESS和SAVEE数据集后建立了一个新的性能基准,达到了66.7%的准确率和0.631的F1分数。这些结果强调了利用预训练模型和数据增强技术克服数据不足的重要性,并为构建更强大、更具通用性的SER系统铺平了道路。
https://arxiv.org/abs/2509.00077
Visual reasoning is critical for a wide range of computer vision tasks that go beyond surface-level object detection and classification. Despite notable advances in relational, symbolic, temporal, causal, and commonsense reasoning, existing surveys often address these directions in isolation, lacking a unified analysis and comparison across reasoning types, methodologies, and evaluation protocols. This survey aims to address this gap by categorizing visual reasoning into five major types (relational, symbolic, temporal, causal, and commonsense) and systematically examining their implementation through architectures such as graph-based models, memory networks, attention mechanisms, and neuro-symbolic systems. We review evaluation protocols designed to assess functional correctness, structural consistency, and causal validity, and critically analyze their limitations in terms of generalizability, reproducibility, and explanatory power. Beyond evaluation, we identify key open challenges in visual reasoning, including scalability to complex scenes, deeper integration of symbolic and neural paradigms, the lack of comprehensive benchmark datasets, and reasoning under weak supervision. Finally, we outline a forward-looking research agenda for next-generation vision systems, emphasizing that bridging perception and reasoning is essential for building transparent, trustworthy, and cross-domain adaptive AI systems, particularly in critical domains such as autonomous driving and medical diagnostics.
视觉推理对于超越表面级对象检测和分类的广泛计算机视觉任务至关重要。尽管在关系、符号、时间、因果以及常识推理方面取得了显著进展,现有的综述往往孤立地探讨这些方向,缺乏对不同推理类型、方法论及评估协议的统一分析与比较。本次调查旨在通过将视觉推理归类为五大主要类型(关系型、符号型、时序型、因果型和常识型),并系统性地审查其通过图模型、内存网络、注意力机制以及神经-符号系统的实现来填补这一空白。我们回顾了用于评估功能正确性、结构一致性及因果有效性的评价协议,并批判性分析了它们在泛化能力、可重复性和解释力方面的局限性。除了评估之外,我们还识别出视觉推理领域的关键开放挑战,包括对复杂场景的扩展性问题、符号与神经范式更深层次的整合需求、缺乏全面基准数据集以及弱监督下的推理难题。最后,我们为下一代视觉系统展望了一个前瞻性的研究议程,强调了将感知和推理相融合对于构建透明、可信且跨领域适应性强的人工智能系统的必要性,尤其是在自动驾驶及医学诊断等关键领域中尤为如此。
https://arxiv.org/abs/2508.10523
We introduce a novel class of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) paradigm, called Residual Reservoir Memory Networks (ResRMNs). ResRMN combines a linear memory reservoir with a non-linear reservoir, where the latter is based on residual orthogonal connections along the temporal dimension for enhanced long-term propagation of the input. The resulting reservoir state dynamics are studied through the lens of linear stability analysis, and we investigate diverse configurations for the temporal residual connections. The proposed approach is empirically assessed on time-series and pixel-level 1-D classification tasks. Our experimental results highlight the advantages of the proposed approach over other conventional RC models.
我们介绍了一种新型未经训练的循环神经网络(RNN)类别,该类别在水库计算(RC)框架下被命名为残差水库记忆网络(ResRMN)。ResRMN结合了线性内存水库和非线性水库,其中后者基于时间维度上的残差正交连接来增强输入信号的长期传播。我们通过线性稳定性分析研究了由此产生的水库状态动力学,并探讨了时间残差连接的各种配置。我们的方法在时间序列和像素级1-D分类任务上进行了经验评估。实验结果突显了所提出的方法相对于其他传统RC模型的优势。
https://arxiv.org/abs/2508.09925
The evolution towards future generation of mobile systems and fixed wireless networks is primarily driven by the urgency to support high-bandwidth and low-latency services across various vertical sectors. This endeavor is fueled by smartphones as well as technologies like industrial internet of things, extended reality (XR), and human-to-machine (H2M) collaborations for fostering industrial and social revolutions like Industry 4.0/5.0 and Society 5.0. To ensure an ideal immersive experience and avoid cyber-sickness for users in all the aforementioned usage scenarios, it is typically challenging to synchronize XR content from a remote machine to a human collaborator according to their head movements across a large geographic span in real-time over communication networks. Thus, we propose a novel H2M collaboration scheme where the human's head movements are predicted ahead with highly accurate models like bidirectional long short-term memory networks to orient the machine's camera in advance. We validate that XR frame size varies in accordance with the human's head movements and predict the corresponding bandwidth requirements from the machine's camera to propose a human-machine coordinated dynamic bandwidth allocation (HMC-DBA) scheme. Through extensive simulations, we show that end-to-end latency and jitter requirements of XR frames are satisfied with much lower bandwidth consumption over enterprise networks like Fiber-To-The-Room-Business. Furthermore, we show that better efficiency in network resource utilization is achieved by employing our proposed HMC-DBA over state-of-the-art schemes.
面向未来移动系统和固定无线网络的演进主要由支持跨多个垂直领域(如工业、医疗等)的高带宽和低延迟服务的需求所驱动。这一进展受到智能手机以及诸如工业物联网、扩展现实(XR)、人机协作(H2M)等技术的推动,这些技术促进了像工业4.0/5.0和社会5.0这样的工业和社会革命。为了确保用户在上述所有使用场景中获得理想的沉浸式体验并避免网络眩晕,实时地将远程机器上的XR内容同步到根据地理范围内头部移动的人类合作者处是一个极具挑战性的任务。 为此,我们提出了一种新型人机协作方案,在该方案中,人类的头部运动通过像双向长短期记忆(BiLSTM)这样的高精度模型进行预测,并提前调整机械摄像头的方向。我们验证了XR帧大小会根据人的头部动作变化,并基于机械摄像头的需求预测相应的带宽需求以制定一种人机协调动态带宽分配(HMC-DBA)方案。 通过广泛的模拟实验,我们展示了在企业网络(如房间到企业的光纤业务网络Fiber-To-The-Room-Business)中,使用该方案可以满足XR帧的端到端延迟和抖动要求,并且以更低的带宽消耗实现。此外,我们还展示了与现有最佳方案相比,采用我们的HMC-DBA方案能更高效地利用网络资源。
https://arxiv.org/abs/2507.15254