Accurate segmentation of cardiac chambers in echocardiography sequences is crucial for the quantitative analysis of cardiac function, aiding in clinical diagnosis and treatment. The imaging noise, artifacts, and the deformation and motion of the heart pose challenges to segmentation algorithms. While existing methods based on convolutional neural networks, Transformers, and space-time memory networks have improved segmentation accuracy, they often struggle with the trade-off between capturing long-range spatiotemporal dependencies and maintaining computational efficiency with fine-grained feature representation. In this paper, we introduce GDKVM, a novel architecture for echocardiography video segmentation. The model employs Linear Key-Value Association (LKVA) to effectively model inter-frame correlations, and introduces Gated Delta Rule (GDR) to efficiently store intermediate memory states. Key-Pixel Feature Fusion (KPFF) module is designed to integrate local and global features at multiple scales, enhancing robustness against boundary blurring and noise interference. We validated GDKVM on two mainstream echocardiography video datasets (CAMUS and EchoNet-Dynamic) and compared it with various state-of-the-art methods. Experimental results show that GDKVM outperforms existing approaches in terms of segmentation accuracy and robustness, while ensuring real-time performance. Code is available at this https URL.
在超声心动图序列中精确分割心脏腔室对于定量分析心脏功能至关重要,有助于临床诊断和治疗。然而,成像噪声、伪影以及心脏变形和运动等因素给分割算法带来了挑战。尽管基于卷积神经网络(CNN)、变压器和时空记忆网络的现有方法提高了分割精度,但它们在捕捉长程空间时间依赖性与保持计算效率和细粒度特征表示之间的权衡上仍然面临困难。 本文介绍了GDKVM,这是一种新颖的超声心动图视频分割架构。该模型采用了线性键值关联(LKVA)来有效建模帧间相关性,并引入了门控增量规则(GDR)以高效存储中间记忆状态。设计了关键像素特征融合(KPFF)模块,旨在在多个尺度上整合局部和全局特征,增强其对边界模糊和噪声干扰的鲁棒性。 我们在两个主流的超声心动图视频数据集(CAMUS 和 EchoNet-Dynamic)上验证了GDKVM,并将其与多种最先进的方法进行了比较。实验结果表明,GDKVM在分割精度和鲁棒性方面超过了现有方法,同时确保了实时性能。代码可在[此处](https://此链接应替换为实际的URL或描述,原文中未提供具体网址)获取。
https://arxiv.org/abs/2512.10252
Accurate Remaining Useful Life (RUL) prediction coupled with uncertainty quantification remains a critical challenge in aerospace prognostics. This research introduces a novel uncertainty-aware deep learning framework that learns aleatoric uncertainty directly through probabilistic modeling, an approach unexplored in existing CMAPSS-based literature. Our hierarchical architecture integrates multi-scale Inception blocks for temporal pattern extraction, bidirectional Long Short-Term Memory networks for sequential modeling, and a dual-level attention mechanism operating simultaneously on sensor and temporal dimensions. The innovation lies in the Bayesian output layer that predicts both mean RUL and variance, enabling the model to learn data-inherent uncertainty. Comprehensive preprocessing employs condition-aware clustering, wavelet denoising, and intelligent feature selection. Experimental validation on NASA CMAPSS benchmarks (FD001-FD004) demonstrates competitive overall performance with RMSE values of 16.22, 19.29, 16.84, and 19.98 respectively. Remarkably, our framework achieves breakthrough critical zone performance (RUL <= 30 cycles) with RMSE of 5.14, 6.89, 5.27, and 7.16, representing 25-40 percent improvements over conventional approaches and establishing new benchmarks for safety-critical predictions. The learned uncertainty provides well-calibrated 95 percent confidence intervals with coverage ranging from 93.5 percent to 95.2 percent, enabling risk-aware maintenance scheduling previously unattainable in CMAPSS literature.
https://arxiv.org/abs/2511.19124
This paper compares Kolmogorov-Arnold Networks (KAN) and Long Short-Term Memory networks (LSTM) for forecasting non-deterministic stock price data, evaluating predictive accuracy versus interpretability trade-offs using Root Mean Square Error (RMSE).LSTM demonstrates substantial superiority across all tested prediction horizons, confirming their established effectiveness for sequential data modelling. Standard KAN, while offering theoretical interpretability through the Kolmogorov-Arnold representation theorem, exhibits significantly higher error rates and limited practical applicability for time series forecasting. The results confirm LSTM dominance in accuracy-critical time series applications while identifying computational efficiency as KANs' primary advantage in resource-constrained scenarios where accuracy requirements are less stringent. The findings support LSTM adoption for practical financial forecasting while suggesting that continued research into specialised KAN architectures may yield future improvements.
https://arxiv.org/abs/2511.18613
We study a systematic approach to a popular Statistical Arbitrage technique: Pairs Trading. Instead of relying on two highly correlated assets, we replace the second asset with a replication of the first using risk factor representations. These factors are obtained through Principal Components Analysis (PCA), exchange traded funds (ETFs), and, as our main contribution, Long Short Term Memory networks (LSTMs). Residuals between the main asset and its replication are examined for mean reversion properties, and trading signals are generated for sufficiently fast mean reverting portfolios. Beyond introducing a deep learning based replication method, we adapt the framework of Avellaneda and Lee (2008) to the Polish market. Accordingly, components of WIG20, mWIG40, and selected sector indices replace the original S&P500 universe, and market parameters such as the risk free rate and transaction costs are updated to reflect local conditions. We outline the full strategy pipeline: risk factor construction, residual modeling via the Ornstein Uhlenbeck process, and signal generation. Each replication technique is described together with its practical implementation. Strategy performance is evaluated over two periods: 2017-2019 and the recessive year 2020. All methods yield profits in 2017-2019, with PCA achieving roughly 20 percent cumulative return and an annualized Sharpe ratio of up to 2.63. Despite multiple adaptations, our conclusions remain consistent with those of the original paper. During the COVID-19 recession, only the ETF based approach remains profitable (about 5 percent annual return), while PCA and LSTM methods underperform. LSTM results, although negative, are promising and indicate potential for future optimization.
我们研究了一种流行统计套利技术——配对交易的系统方法。不同于依赖于两个高度相关的资产,我们将第二个资产替换为使用风险因子表示法复制第一个资产。这些因素是通过主成分分析(PCA)、交易所交易基金(ETFs)以及我们的主要贡献——长期短期记忆网络(LSTMs)获得的。我们检查主要资产与其复制版本之间的残差是否具有均值回归特性,并为此类足够快速地呈现均值回归特性的投资组合生成交易信号。 除了介绍一种基于深度学习的复制方法外,我们将 Avellaneda 和 Lee (2008) 的框架适应于波兰市场。相应地,WIG20、mWIG40 以及选定的行业指数替换了原始的 S&P500 投资组合,并且风险无风险利率和交易成本等市场参数更新以反映当地条件。我们概述了整个策略流程:风险管理因素构建、通过奥恩斯坦-乌伦贝克过程对残差进行建模,及信号生成。每种复制技术及其实际应用都得到了描述。 策略性能在两个时期进行了评估:2017年至2019年和经济衰退期的2020年。所有方法均在 2017-2019 年期间获利,其中 PCA 实现了大约 20% 的累积回报,并且年度化夏普比率达到 2.63 左右。尽管进行了多次调整,我们的结论仍与原论文保持一致。然而,在新冠疫情期间,只有基于 ETF 方法仍然有利可图(年回报率约为5%),而 PCA 和 LSTM 方法表现不佳。虽然 LSTM 的结果是负的,但显示了未来优化的巨大潜力。
https://arxiv.org/abs/2512.02037
Reliable hydrologic and flood forecasting requires models that remain stable when input data are delayed, missing, or inconsistent. However, most advances in rainfall-runoff prediction have been evaluated under ideal data conditions, emphasizing accuracy rather than operational resilience. Here, we develop an operationally ready emulator of the Global Flood Awareness System (GloFAS) that couples long- and short-term memory networks with a relaxed water-balance constraint to preserve physical coherence. Five architectures span a continuum of information availability: from complete historical and forecast forcings to scenarios with data latency and outages, allowing systematic evaluation of robustness. Trained in minimally managed catchments across the United States and tested in more than 5,000 basins, including heavily regulated rivers in India, the emulator reproduces the hydrological core of GloFAS and degrades smoothly as information quality declines. Transfer across contrasting hydroclimatic and management regimes yields reduced yet physically consistent performance, defining the limits of generalization under data scarcity and human influence. The framework establishes operational robustness as a measurable property of hydrological machine learning and advances the design of reliable real-time forecasting systems.
可靠的水文和洪水预报需要在输入数据延迟、缺失或不一致的情况下仍能保持稳定的模型。然而,大多数降雨径流预测的进步是在理想的数据条件下评估的,注重准确性而非操作稳定性。在此,我们开发了一个可以立即投入使用的全球洪水预警系统(GloFAS)模拟器,它将长短期记忆网络与松弛的水量平衡约束相结合以维持物理连贯性。五种架构涵盖了信息可用性的连续范围:从完整的历史和预测强迫条件到数据延迟和中断的情况,从而能够系统地评估其稳健性。该模拟器在美国最小管理流域中进行了训练,并在包括印度受严格管控河流在内的超过5000个盆地中进行了测试。模拟器再现了GloFAS的水文核心内容,并随着信息质量下降而平滑退化。在不同的气候和管理制度下进行迁移学习,虽然性能有所降低但仍保持物理一致性,定义了数据稀缺及人为影响下的泛化极限。该框架将操作稳定性确立为水文学机器学习的一个可测量属性,并推进了可靠实时预报系统的设计。
https://arxiv.org/abs/2510.18535
We present a reproducibility study of the state-of-the-art neural architecture for sequence labeling proposed by Ma and Hovy (2016)\cite{ma2016end}. The original BiLSTM-CNN-CRF model combines character-level representations via Convolutional Neural Networks (CNNs), word-level context modeling through Bi-directional Long Short-Term Memory networks (BiLSTMs), and structured prediction using Conditional Random Fields (CRFs). This end-to-end approach eliminates the need for hand-crafted features while achieving excellent performance on named entity recognition (NER) and part-of-speech (POS) tagging tasks. Our implementation successfully reproduces the key results, achieving 91.18\% F1-score on CoNLL-2003 NER and demonstrating the model's effectiveness across sequence labeling tasks. We provide a detailed analysis of the architecture components and release an open-source PyTorch implementation to facilitate further research.
我们对Ma和Hovy(2016)\cite{ma2016end}提出的最新序列标注神经架构进行了可重复性研究。原BiLSTM-CNN-CRF模型结合了通过卷积神经网络(CNNs)产生的字符级表示,双向长短期记忆网络(BiLSTMs)进行的词级别上下文建模以及使用条件随机场(CRFs)实现的结构化预测。这种端到端的方法消除了手工特征设计的需求,并在命名实体识别(NER)和词性标注(POS)任务上取得了优异的成绩。 我们的实现在关键结果方面成功地再现了原始研究,具体而言,在CoNLL-2003 NER数据集上实现了91.18%的F1分数,展示了该模型在序列标注任务中的有效性。我们提供了对架构组件的详细分析,并发布了一个开源的PyTorch实现,以促进进一步的研究。
https://arxiv.org/abs/2510.10936
Social media has become an essential part of the digital age, serving as a platform for communication, interaction, and information sharing. Celebrities are among the most active users and often reveal aspects of their personal and professional lives through online posts. Platforms such as Twitter provide an opportunity to analyze language and behavior for understanding demographic and social patterns. Since followers frequently share linguistic traits and interests with the celebrities they follow, textual data from followers can be used to predict celebrity demographics. However, most existing research in this field has focused on English and other high-resource languages, leaving Urdu largely unexplored. This study applies modern machine learning and deep learning techniques to the problem of celebrity profiling in Urdu. A dataset of short Urdu tweets from followers of subcontinent celebrities was collected and preprocessed. Multiple algorithms were trained and compared, including Logistic Regression, Support Vector Machines, Random Forests, Convolutional Neural Networks, and Long Short-Term Memory networks. The models were evaluated using accuracy, precision, recall, F1-score, and cumulative rank (cRank). The best performance was achieved for gender prediction with a cRank of 0.65 and an accuracy of 0.65, followed by moderate results for age, profession, and fame prediction. These results demonstrate that follower-based linguistic features can be effectively leveraged using machine learning and neural approaches for demographic prediction in Urdu, a low-resource language.
社交媒体已成为数字时代不可或缺的一部分,它作为沟通、互动和信息分享的平台发挥着重要作用。名人是其中最活跃的一类用户群体,他们经常通过在线帖子展示个人生活和职业生活的各个方面。例如,Twitter这样的平台为分析语言和行为提供了机会,以理解人口和社会模式。由于追随者通常与他们关注的名人的语言特征和兴趣有相似之处,因此可以从这些追随者的文本数据中预测名人的特征。 然而,在这一领域的大多数现有研究主要集中在英语和其他资源丰富的语言上,乌尔都语则相对较少被探索。本研究将现代机器学习和深度学习技术应用于乌尔都语的名人画像问题。我们收集并预处理了一个由次大陆名人的追随者发布的短微博数据集。训练了包括逻辑回归、支持向量机、随机森林、卷积神经网络和长短期记忆网络在内的多种算法,并通过准确率、精确度、召回率、F1分数以及累积排名(cRank)对模型进行了评估。 性别预测方面取得了最佳性能,其cRank为0.65,准确率为0.65;年龄、职业和知名度预测则表现较为一般。这些结果表明,在乌尔都语这一资源较少的语言中,可以通过机器学习和神经网络方法有效地利用追随者的语言特征来进行人口统计学预测。
https://arxiv.org/abs/2510.11739
This review aims to conduct a comparative analysis of liquid neural networks (LNNs) and traditional recurrent neural networks (RNNs) and their variants, such as long short-term memory networks (LSTMs) and gated recurrent units (GRUs). The core dimensions of the analysis include model accuracy, memory efficiency, and generalization ability. By systematically reviewing existing research, this paper explores the basic principles, mathematical models, key characteristics, and inherent challenges of these neural network architectures in processing sequential data. Research findings reveal that LNN, as an emerging, biologically inspired, continuous-time dynamic neural network, demonstrates significant potential in handling noisy, non-stationary data, and achieving out-of-distribution (OOD) generalization. Additionally, some LNN variants outperform traditional RNN in terms of parameter efficiency and computational speed. However, RNN remains a cornerstone in sequence modeling due to its mature ecosystem and successful applications across various tasks. This review identifies the commonalities and differences between LNNs and RNNs, summarizes their respective shortcomings and challenges, and points out valuable directions for future research, particularly emphasizing the importance of improving the scalability of LNNs to promote their application in broader and more complex scenarios.
这篇评论旨在对液态神经网络(LNN)和传统的循环神经网络(RNN)及其变体,如长短期记忆网络(LSTM)和门控循环单元(GRU),进行比较分析。该研究的核心维度包括模型准确性、内存效率以及泛化能力。通过系统回顾现有的研究成果,本文探讨了这些神经网络架构在处理序列数据时的基本原理、数学模型、关键特性及其内在挑战。研究表明,作为新兴的生物启发式连续时间动态神经网络,LNN在处理噪声和非平稳数据方面表现出显著潜力,并且能够实现出域(OOD)泛化。此外,一些LNN变体在参数效率和计算速度上优于传统RNN。然而,由于其成熟的生态系统以及跨各种任务的成功应用,RNN仍然是序列建模的基石。这篇评论确定了LNN与RNN之间的共性和差异,总结了它们各自的缺点和挑战,并指出未来研究的重要方向,尤其强调提高LNN可扩展性的重要性,以促进其在更广泛和复杂的场景中的应用。
https://arxiv.org/abs/2510.07578
Solar Proton Events (SPEs) cause significant radiation hazards to satellites, astronauts, and technological systems. Accurate forecasting of their proton flux time profiles is crucial for early warnings and mitigation. This paper explores deep learning sequence-to-sequence (seq2seq) models based on Long Short-Term Memory networks to predict 24-hour proton flux profiles following SPE onsets. We used a dataset of 40 well-connected SPEs (1997-2017) observed by NOAA GOES, each associated with a >=M-class western-hemisphere solar flare and undisturbed proton flux profiles. Using 4-fold stratified cross-validation, we evaluate seq2seq model configurations (varying hidden units and embedding dimensions) under multiple forecasting scenarios: (i) proton-only input vs. combined proton+X-ray input, (ii) original flux data vs. trend-smoothed data, and (iii) autoregressive vs. one-shot forecasting. Our major results are as follows: First, one-shot forecasting consistently yields lower error than autoregressive prediction, avoiding the error accumulation seen in iterative approaches. Second, on the original data, proton-only models outperform proton+X-ray models. However, with trend-smoothed data, this gap narrows or reverses in proton+X-ray models. Third, trend-smoothing significantly enhances the performance of proton+X-ray models by mitigating fluctuations in the X-ray channel. Fourth, while models trained on trendsmoothed data perform best on average, the best-performing model was trained on original data, suggesting that architectural choices can sometimes outweigh the benefits of data preprocessing.
太阳质子事件(SPEs)对卫星、宇航员和技术系统造成显著的辐射危害。准确预测其质子通量的时间曲线对于早期预警和缓解措施至关重要。本文探讨了基于长短期记忆网络的深度学习序列到序列(seq2seq)模型,以预测在SPE发生后的24小时内的质子通量曲线。我们使用了一套由NOAA GOES观测的40个相互关联良好的SPE(1997-2017年间)的数据集,每个事件都与西部半球的大于M级太阳耀斑以及未受干扰的质子通量曲线相关联。 通过4折分层交叉验证,我们在多种预测情景下评估了seq2seq模型配置的不同参数设置(隐藏单元和嵌入维度变化),包括:(i) 单独使用质子输入与同时使用质子和X射线输入;(ii) 使用原始通量数据与经过趋势平滑处理的数据;以及 (iii) 自回归预测与一次性预测。我们的主要研究结果如下: 1. 一次性预测方法始终比自回归预测产生更低的误差,避免了迭代方法中的误差累积。 2. 在使用原始数据时,单独使用质子输入的模型优于同时使用质子和X射线输入的模型。然而,在使用趋势平滑后的数据时,后者的表现与前者差距缩小甚至超越前者。 3. 趋势平滑显著提升了质子加X射线模型的效果,减少了X射线通道中的波动性。 4. 尽管基于经过趋势平滑处理的数据训练出的模型平均性能最佳,但表现最好的模型是使用原始数据进行训练的结果。这表明架构选择有时可以克服预处理带来的好处。 这些发现对于提高太阳质子事件预测精度具有重要的实际意义,并且能够为未来的相关研究提供有价值的参考信息。
https://arxiv.org/abs/2510.05399
This study applies a range of forecasting techniques,including ARIMA, Prophet, Long Short Term Memory networks (LSTM), Temporal Convolutional Networks (TCN), and XGBoost, to model and predict Russian equipment losses during the ongoing war in Ukraine. Drawing on daily and monthly open-source intelligence (OSINT) data from WarSpotting, we aim to assess trends in attrition, evaluate model performance, and estimate future loss patterns through the end of 2025. Our findings show that deep learning models, particularly TCN and LSTM, produce stable and consistent forecasts, especially under conditions of high temporal granularity. By comparing different model architectures and input structures, this study highlights the importance of ensemble forecasting in conflict modeling, and the value of publicly available OSINT data in quantifying material degradation over time.
这项研究应用了一系列预测技术,包括ARIMA、Prophet、长短期记忆网络(LSTM)、时间卷积网络(TCN)和XGBoost,来建模并预测俄罗斯在乌克兰战争中装备的损失。我们利用WarSpotting提供的每日和月度开源情报(OSINT)数据,旨在评估损耗趋势、评估模型性能,并估计到2025年底的未来损失模式。我们的研究发现表明,深度学习模型,尤其是TCN和LSTM,在时间粒度高的条件下能产生稳定且一致的预测结果。通过比较不同模型架构和输入结构,本研究强调了冲突建模中的集成预测的重要性以及公开可用OSINT数据在量化材料损耗方面的时间价值。
https://arxiv.org/abs/2509.07813
Speech Emotion Recognition (SER) presents a significant yet persistent challenge in human-computer interaction. While deep learning has advanced spoken language processing, achieving high performance on limited datasets remains a critical hurdle. This paper confronts this issue by developing and evaluating a suite of machine learning models, including Support Vector Machines (SVMs), Long Short-Term Memory networks (LSTMs), and Convolutional Neural Networks (CNNs), for automated emotion classification in human speech. We demonstrate that by strategically employing transfer learning and innovative data augmentation techniques, our models can achieve impressive performance despite the constraints of a relatively small dataset. Our most effective model, a ResNet34 architecture, establishes a new performance benchmark on the combined RAVDESS and SAVEE datasets, attaining an accuracy of 66.7% and an F1 score of 0.631. These results underscore the substantial benefits of leveraging pre-trained models and data augmentation to overcome data scarcity, thereby paving the way for more robust and generalizable SER systems.
语音情感识别(SER)在人机交互中面临着一个重大且持久的挑战。虽然深度学习已经推动了口语处理技术的进步,但在有限的数据集上实现高性能仍然是一个重要障碍。本文通过开发和评估一系列机器学习模型来应对这一问题,这些模型包括支持向量机(SVM)、长短时记忆网络(LSTM)和卷积神经网络(CNN),以进行人类语音情感的自动分类。我们证明,通过战略性地采用迁移学习和创新的数据增强技术,我们的模型能够在数据集相对较小的情况下实现令人印象深刻的性能表现。 最有效的模型是我们采用的ResNet34架构,在结合RAVDESS和SAVEE数据集后建立了一个新的性能基准,达到了66.7%的准确率和0.631的F1分数。这些结果强调了利用预训练模型和数据增强技术克服数据不足的重要性,并为构建更强大、更具通用性的SER系统铺平了道路。
https://arxiv.org/abs/2509.00077
Visual reasoning is critical for a wide range of computer vision tasks that go beyond surface-level object detection and classification. Despite notable advances in relational, symbolic, temporal, causal, and commonsense reasoning, existing surveys often address these directions in isolation, lacking a unified analysis and comparison across reasoning types, methodologies, and evaluation protocols. This survey aims to address this gap by categorizing visual reasoning into five major types (relational, symbolic, temporal, causal, and commonsense) and systematically examining their implementation through architectures such as graph-based models, memory networks, attention mechanisms, and neuro-symbolic systems. We review evaluation protocols designed to assess functional correctness, structural consistency, and causal validity, and critically analyze their limitations in terms of generalizability, reproducibility, and explanatory power. Beyond evaluation, we identify key open challenges in visual reasoning, including scalability to complex scenes, deeper integration of symbolic and neural paradigms, the lack of comprehensive benchmark datasets, and reasoning under weak supervision. Finally, we outline a forward-looking research agenda for next-generation vision systems, emphasizing that bridging perception and reasoning is essential for building transparent, trustworthy, and cross-domain adaptive AI systems, particularly in critical domains such as autonomous driving and medical diagnostics.
视觉推理对于超越表面级对象检测和分类的广泛计算机视觉任务至关重要。尽管在关系、符号、时间、因果以及常识推理方面取得了显著进展,现有的综述往往孤立地探讨这些方向,缺乏对不同推理类型、方法论及评估协议的统一分析与比较。本次调查旨在通过将视觉推理归类为五大主要类型(关系型、符号型、时序型、因果型和常识型),并系统性地审查其通过图模型、内存网络、注意力机制以及神经-符号系统的实现来填补这一空白。我们回顾了用于评估功能正确性、结构一致性及因果有效性的评价协议,并批判性分析了它们在泛化能力、可重复性和解释力方面的局限性。除了评估之外,我们还识别出视觉推理领域的关键开放挑战,包括对复杂场景的扩展性问题、符号与神经范式更深层次的整合需求、缺乏全面基准数据集以及弱监督下的推理难题。最后,我们为下一代视觉系统展望了一个前瞻性的研究议程,强调了将感知和推理相融合对于构建透明、可信且跨领域适应性强的人工智能系统的必要性,尤其是在自动驾驶及医学诊断等关键领域中尤为如此。
https://arxiv.org/abs/2508.10523
We introduce a novel class of untrained Recurrent Neural Networks (RNNs) within the Reservoir Computing (RC) paradigm, called Residual Reservoir Memory Networks (ResRMNs). ResRMN combines a linear memory reservoir with a non-linear reservoir, where the latter is based on residual orthogonal connections along the temporal dimension for enhanced long-term propagation of the input. The resulting reservoir state dynamics are studied through the lens of linear stability analysis, and we investigate diverse configurations for the temporal residual connections. The proposed approach is empirically assessed on time-series and pixel-level 1-D classification tasks. Our experimental results highlight the advantages of the proposed approach over other conventional RC models.
我们介绍了一种新型未经训练的循环神经网络(RNN)类别,该类别在水库计算(RC)框架下被命名为残差水库记忆网络(ResRMN)。ResRMN结合了线性内存水库和非线性水库,其中后者基于时间维度上的残差正交连接来增强输入信号的长期传播。我们通过线性稳定性分析研究了由此产生的水库状态动力学,并探讨了时间残差连接的各种配置。我们的方法在时间序列和像素级1-D分类任务上进行了经验评估。实验结果突显了所提出的方法相对于其他传统RC模型的优势。
https://arxiv.org/abs/2508.09925
The evolution towards future generation of mobile systems and fixed wireless networks is primarily driven by the urgency to support high-bandwidth and low-latency services across various vertical sectors. This endeavor is fueled by smartphones as well as technologies like industrial internet of things, extended reality (XR), and human-to-machine (H2M) collaborations for fostering industrial and social revolutions like Industry 4.0/5.0 and Society 5.0. To ensure an ideal immersive experience and avoid cyber-sickness for users in all the aforementioned usage scenarios, it is typically challenging to synchronize XR content from a remote machine to a human collaborator according to their head movements across a large geographic span in real-time over communication networks. Thus, we propose a novel H2M collaboration scheme where the human's head movements are predicted ahead with highly accurate models like bidirectional long short-term memory networks to orient the machine's camera in advance. We validate that XR frame size varies in accordance with the human's head movements and predict the corresponding bandwidth requirements from the machine's camera to propose a human-machine coordinated dynamic bandwidth allocation (HMC-DBA) scheme. Through extensive simulations, we show that end-to-end latency and jitter requirements of XR frames are satisfied with much lower bandwidth consumption over enterprise networks like Fiber-To-The-Room-Business. Furthermore, we show that better efficiency in network resource utilization is achieved by employing our proposed HMC-DBA over state-of-the-art schemes.
面向未来移动系统和固定无线网络的演进主要由支持跨多个垂直领域(如工业、医疗等)的高带宽和低延迟服务的需求所驱动。这一进展受到智能手机以及诸如工业物联网、扩展现实(XR)、人机协作(H2M)等技术的推动,这些技术促进了像工业4.0/5.0和社会5.0这样的工业和社会革命。为了确保用户在上述所有使用场景中获得理想的沉浸式体验并避免网络眩晕,实时地将远程机器上的XR内容同步到根据地理范围内头部移动的人类合作者处是一个极具挑战性的任务。 为此,我们提出了一种新型人机协作方案,在该方案中,人类的头部运动通过像双向长短期记忆(BiLSTM)这样的高精度模型进行预测,并提前调整机械摄像头的方向。我们验证了XR帧大小会根据人的头部动作变化,并基于机械摄像头的需求预测相应的带宽需求以制定一种人机协调动态带宽分配(HMC-DBA)方案。 通过广泛的模拟实验,我们展示了在企业网络(如房间到企业的光纤业务网络Fiber-To-The-Room-Business)中,使用该方案可以满足XR帧的端到端延迟和抖动要求,并且以更低的带宽消耗实现。此外,我们还展示了与现有最佳方案相比,采用我们的HMC-DBA方案能更高效地利用网络资源。
https://arxiv.org/abs/2507.15254
The proliferation of large language models (LLMs) has significantly transformed the digital information landscape, making it increasingly challenging to distinguish between human-written and LLM-generated content. Detecting LLM-generated information is essential for preserving trust on digital platforms (e.g., social media and e-commerce sites) and preventing the spread of misinformation, a topic that has garnered significant attention in IS research. However, current detection methods, which primarily focus on identifying content generated by specific LLMs in known domains, face challenges in generalizing to new (i.e., unseen) LLMs and domains. This limitation reduces their effectiveness in real-world applications, where the number of LLMs is rapidly multiplying and content spans a vast array of domains. In response, we introduce a general LLM detector (GLD) that combines a twin memory networks design and a theory-guided detection generalization module to detect LLM-generated information across unseen LLMs and domains. Using real-world datasets, we conduct extensive empirical evaluations and case studies to demonstrate the superiority of GLD over state-of-the-art detection methods. The study has important academic and practical implications for digital platforms and LLMs.
大型语言模型(LLMs)的广泛使用显著改变了数字信息景观,使得区分人类撰写的内容和由LLM生成的内容变得越来越困难。检测LLM生成的信息对于维护数字平台(如社交媒体和电子商务网站)上的信任以及防止错误信息的传播至关重要,这是一个在信息系统研究中备受关注的话题。然而,目前主要专注于识别特定LLM在其已知领域内生成内容的方法,在泛化到新的(即未知的)LLM和领域时面临挑战。这种限制降低了它们在现实世界应用中的有效性,因为在这些应用中,LLMs的数量正在迅速增加,并且内容涵盖了广泛的领域。 为了解决这个问题,我们引入了一种通用的大语言模型检测器(GLD),该检测器结合了双内存网络设计和理论指导的检测泛化模块来检测跨越未知LLM和领域的LLM生成的信息。通过使用真实世界的数据集进行广泛的实证评估和案例研究,我们展示了GLD在与现有最先进的检测方法相比时的优势。这项研究对于数字平台和大型语言模型都具有重要的学术和实践意义。
https://arxiv.org/abs/2506.21589
Grain growth simulation is crucial for predicting metallic material microstructure evolution during annealing and resulting final mechanical properties, but traditional partial differential equation-based methods are computationally expensive, creating bottlenecks in materials design and manufacturing. In this work, we introduce a machine learning framework that combines a Convolutional Long Short-Term Memory networks with an Autoencoder to efficiently predict grain growth evolution. Our approach captures both spatial and temporal aspects of grain evolution while encoding high-dimensional grain structure data into a compact latent space for pattern learning, enhanced by a novel composite loss function combining Mean Squared Error, Structural Similarity Index Measurement, and Boundary Preservation to maintain structural integrity of grain boundary topology of the prediction. Results demonstrated that our machine learning approach accelerates grain growth prediction by up to \SI{89}{\times} faster, reducing computation time from \SI{10}{\minute} to approximately \SI{10}{\second} while maintaining high-fidelity predictions. The best model (S-30-30) achieving a structural similarity score of \SI{86.71}{\percent} and mean grain size error of just \SI{0.07}{\percent}. All models accurately captured grain boundary topology, morphology, and size distributions. This approach enables rapid microstructural prediction for applications where conventional simulations are prohibitively time-consuming, potentially accelerating innovation in materials science and manufacturing.
金属材料在退火过程中的微观结构演变及最终机械性能的预测对于材料设计和制造至关重要。然而,传统的基于偏微分方程的方法计算成本高昂,成为了瓶颈。本文中,我们引入了一种结合卷积长短时记忆网络与自动编码器的机器学习框架,用于高效地预测晶粒生长演化。 我们的方法能够捕捉晶粒演变的空间和时间特性,并将高维晶粒结构数据编码到一个紧凑的潜在空间中以进行模式学习。此外,通过一种新的复合损失函数(结合均方误差、结构相似性指数测量以及边界保持)来维护预测中的晶界拓扑结构完整性。 实验结果表明,我们的机器学习方法能够加速晶粒生长预测高达89倍,计算时间从10分钟减少到大约10秒,同时维持高精度的预测。最佳模型(S-30-30)达到了86.71%的结构相似度得分和仅0.07%的平均晶粒尺寸误差。所有模型均准确捕捉了晶界拓扑、形态及大小分布。 这种快速微观结构预测方法适用于传统模拟过于耗时的应用场景,有可能加速材料科学与制造领域的创新。
https://arxiv.org/abs/2505.05354
Video Object Segmentation (VOS) is one of the most fundamental and challenging tasks in computer vision and has a wide range of applications. Most existing methods rely on spatiotemporal memory networks to extract frame-level features and have achieved promising results on commonly used datasets. However, these methods often struggle in more complex real-world scenarios. This paper addresses this issue, aiming to achieve accurate segmentation of video objects in challenging scenes. We propose fine-tuning VOS (FVOS), optimizing existing methods for specific datasets through tailored training. Additionally, we introduce a morphological post-processing strategy to address the issue of excessively large gaps between adjacent objects in single-model predictions. Finally, we apply a voting-based fusion method on multi-scale segmentation results to generate the final output. Our approach achieves J&F scores of 76.81% and 83.92% during the validation and testing stages, respectively, securing third place overall in the MOSE Track of the 4th PVUW challenge 2025.
视频对象分割(VOS)是计算机视觉中最基础且最具挑战性的任务之一,它在广泛的应用领域中发挥着重要作用。目前大多数现有方法依赖于时空记忆网络来提取帧级特征,并在常用数据集上取得了令人鼓舞的结果。然而,在更复杂的现实场景下,这些方法往往表现出色不足。 本文旨在解决这一问题,目标是实现对具有挑战性场景中的视频对象进行准确分割。我们提出了一种针对特定数据集优化现有方法的微调VOS(FVOS)策略,并通过定制化训练来提升性能。此外,我们还引入了一种形态学后处理策略,以应对单模型预测中相邻对象间距离过大的问题。最后,我们将多尺度分割结果结合投票融合法生成最终输出。 我们的方法在验证阶段和测试阶段分别取得了J&F分数76.81%和83.92%,在2025年第四届PVUW挑战赛的MOSE轨道中获得了总成绩第三名。
https://arxiv.org/abs/2504.09507
Mainstream visual object tracking frameworks predominantly rely on template matching paradigms. Their performance heavily depends on the quality of template features, which becomes increasingly challenging to maintain in complex scenarios involving target deformation, occlusion, and background clutter. While existing spatiotemporal memory-based trackers emphasize memory capacity expansion, they lack effective mechanisms for dynamic feature selection and adaptive fusion. To address this gap, we propose a Dynamic Attention Mechanism in Spatiotemporal Memory Network (DASTM) with two key innovations: 1) A differentiable dynamic attention mechanism that adaptively adjusts channel-spatial attention weights by analyzing spatiotemporal correlations between the templates and memory features; 2) A lightweight gating network that autonomously allocates computational resources based on target motion states, prioritizing high-discriminability features in challenging scenarios. Extensive evaluations on OTB-2015, VOT 2018, LaSOT, and GOT-10K benchmarks demonstrate our DASTM's superiority, achieving state-of-the-art performance in success rate, robustness, and real-time efficiency, thereby offering a novel solution for real-time tracking in complex environments.
主流的视觉对象跟踪框架主要依赖于模板匹配范式。其性能很大程度上取决于模板特征的质量,而在涉及目标变形、遮挡和背景杂乱等复杂场景的情况下,保持高质量模板特征变得越来越具有挑战性。尽管现有的基于时空记忆的追踪器强调扩大内存容量,但它们缺乏有效的动态特征选择和自适应融合机制。为了弥补这一不足,我们提出了一种在时空记忆网络中的动态注意力机制(DASTM),其包含两个关键创新点:1)一种可微分的动态注意力机制,该机制通过分析模板与记忆特征之间的时空相关性来自适应地调整通道-空间注意权重;2)一个轻量级的门控网络,根据目标运动状态自主分配计算资源,在复杂场景中优先处理高区分度特征。在OTB-2015、VOT 2018、LaSOT和GOT-10K基准测试中的广泛评估证明了我们提出的DASTM的优势,实现了成功率、鲁棒性和实时效率方面的最新性能,从而为复杂环境下的实时跟踪提供了新颖的解决方案。
https://arxiv.org/abs/2503.16768
Video object segmentation is crucial for the efficient analysis of complex medical video data, yet it faces significant challenges in data availability and annotation. We introduce the task of one-shot medical video object segmentation, which requires separating foreground and background pixels throughout a video given only the mask annotation of the first frame. To address this problem, we propose a temporal contrastive memory network comprising image and mask encoders to learn feature representations, a temporal contrastive memory bank that aligns embeddings from adjacent frames while pushing apart distant ones to explicitly model inter-frame relationships and stores these features, and a decoder that fuses encoded image features and memory readouts for segmentation. We also collect a diverse, multi-source medical video dataset spanning various modalities and anatomies to benchmark this task. Extensive experiments demonstrate state-of-the-art performance in segmenting both seen and unseen structures from a single exemplar, showing ability to generalize from scarce labels. This highlights the potential to alleviate annotation burdens for medical video analysis. Code is available at this https URL.
视频对象分割在复杂医学视频数据的高效分析中至关重要,但其面临着数据可用性和标注方面的重大挑战。我们提出了单样本医学视频对象分割任务,该任务仅基于第一帧的掩码标注来区分整个视频中的前景和背景像素。为解决这一问题,我们提出了一种包含图像编码器和掩码编码器以学习特征表示、时间对比记忆库(Temporal Contrastive Memory Bank)以对齐相邻帧之间的嵌入并拉开不相关帧之间距离以便显式建模帧间关系,并存储这些特征的网络架构。此外,还有一个解码器用于融合编码后的图像特征与记忆库读取的内容来进行分割。 为了评估这一任务,我们收集了一个多样化的、多源医学视频数据集,涵盖各种模式和解剖结构的数据,以作为基准测试。广泛的实验展示了在单个示例的情况下对已见和未见结构进行分割的最先进的性能,这表明了从稀缺标签中泛化的能力。这项研究强调了解决标注负担对于医学视频分析具有潜在的作用。 代码可在提供的链接获取:[此URL](请将方括号中的内容替换为实际的URL)。
https://arxiv.org/abs/2503.14979
The artificial lateral line (ALL) is a bioinspired flow sensing system for underwater robots, comprising of distributed flow sensors. The ALL has been successfully applied to detect the undulatory flow fields generated by body undulation and tail-flapping of bioinspired robotic fish. However, its feasibility and performance in sensing the undulatory flow fields produced by human leg kicks during swimming has not been systematically tested and studied. This paper presents a novel sensing framework to investigate the undulatory flow field generated by swimmer's leg kicks, leveraging bioinspired ALL sensing. To evaluate the feasibility of using the ALL system for sensing the undulatory flow fields generated by swimmer leg kicks, this paper designs an experimental platform integrating an ALL system and a lab-fabricated human leg model. To enhance the accuracy of flow sensing, this paper proposes a feature extraction method that dynamically fuses time-domain and time-frequency characteristics. Specifically, time-domain features are extracted using one-dimensional convolutional neural networks and bidirectional long short-term memory networks (1DCNN-BiLSTM), while time-frequency features are extracted using short-term Fourier transform and two-dimensional convolutional neural networks (STFT-2DCNN). These features are then dynamically fused based on attention mechanisms to achieve accurate sensing of the undulatory flow field. Furthermore, extensive experiments are conducted to test various scenarios inspired by human swimming, such as leg kick pattern recognition and kicking leg localization, achieving satisfactory results.
人工侧线系统(ALL)是一种仿生流体传感系统,用于水下机器人,由分布式的流动传感器组成。该系统已成功应用于检测由生物启发的机器鱼身体摆动和尾部拍打产生的波动水流场。然而,其在感测游泳时人类腿部踢水所产生的波动水流场方面的可行性和性能尚未经过系统的测试与研究。 本文提出了一种新的传感框架,利用仿生ALL传感技术来探究游泳者腿部踢水所生成的波动水流场。为了评估使用ALL系统感应由游泳者的腿部踢动产生的波动水流场的可能性,本文设计了一个实验平台,该平台集成了ALL系统和实验室制造的人类腿模型。 为提高流体感知的准确性,本研究提出了一种基于注意力机制动态融合时域与时频特征的提取方法。具体而言,时间领域的特性通过一维卷积神经网络(1DCNN)与双向长短时记忆网络(BiLSTM)进行抽取;而时间-频率特性则通过短时傅里叶变换(STFT)及二维卷积神经网络(2DCNN)来抽取。这些特征随后基于注意力机制被动态融合,以实现对波动水流场的准确感知。 此外,本文还进行了广泛的实验,测试了由人类游泳启发的各种场景下的性能,如踢腿模式识别和踢动腿部定位等任务,并取得了令人满意的结果。
https://arxiv.org/abs/2503.07312