We introduce ALT, an open-source Python package created for efficient and accurate time series classification (TSC). The package implements the adaptive law-based transformation (ALT) algorithm, which transforms raw time series data into a linearly separable feature space using variable-length shifted time windows. This adaptive approach enhances its predecessor, the linear law-based transformation (LLT), by effectively capturing patterns of varying temporal scales. The software is implemented for scalability, interpretability, and ease of use, achieving state-of-the-art performance with minimal computational overhead. Extensive benchmarking on real-world datasets demonstrates the utility of ALT for diverse TSC tasks in physics and related domains.
我们介绍了ALT,这是一个开源的Python包,专为高效且准确的时间序列分类(TSC)而设计。该软件包实现了基于自适应法则的转换(ALT)算法,通过使用可变长度的移位时间窗口将原始时间序列数据转换成线性可分特征空间。这种方法改进了其前身——基于线性法则的变换(LLT),能够有效捕捉不同时间尺度上的模式变化。该软件包为可扩展性、可解释性和易用性而设计,在保持最低计算开销的情况下,实现了业界领先的表现。通过在实际数据集上进行广泛的基准测试,证明了ALT在物理及相关领域各种TSC任务中的实用性。
https://arxiv.org/abs/2504.12841
Recent deep learning models for Long-term Time Series Forecasting (LTSF) often emphasize complex, handcrafted designs, while simpler architectures like linear models or MLPs have often outperformed these intricate solutions. In this paper, we revisit and organize the core ideas behind several key techniques, such as redundancy reduction and multi-scale modeling, which are frequently employed in advanced LTSF models. Our goal is to streamline these ideas for more efficient deep learning utilization. To this end, we introduce TimeCapsule, a model built around the principle of high-dimensional information compression that unifies these techniques in a generalized yet simplified framework. Specifically, we model time series as a 3D tensor, incorporating temporal, variate, and level dimensions, and leverage mode production to capture multi-mode dependencies while achieving dimensionality compression. We propose an internal forecast within the compressed representation domain, supported by the Joint-Embedding Predictive Architecture (JEPA), to monitor the learning of predictive representations. Extensive experiments on challenging benchmarks demonstrate the versatility of our method, showing that TimeCapsule can achieve state-of-the-art performance.
最近的长期时间序列预测(LTSF)深度学习模型往往强调复杂的手工设计,而简单的架构如线性模型或多层感知机(MLP)却经常超越这些复杂的解决方案。在本文中,我们回顾并整理了几种关键技术背后的核心理念,例如冗余减少和多尺度建模,这些技术常被高级LTSF模型采用。我们的目标是简化这些概念,以实现更高效的深度学习应用。为此,我们引入了TimeCapsule模型,该模型围绕高维信息压缩的原则构建,并在通用化而简化的框架中整合了这些技术。 具体而言,我们将时间序列建模为一个三维张量,包含时间、变量和层级维度,并利用模式生成来捕捉多模式依赖关系,同时实现维度压缩。我们提出了一种内部预测机制,在压缩表示域内进行,由联合嵌入预测架构(JEPA)支持,以监测预测表示的学习过程。 在具有挑战性的基准测试上的广泛实验表明了我们方法的多功能性,并显示TimeCapsule可以达到最先进的性能水平。
https://arxiv.org/abs/2504.12721
Current sign language machine translation systems rely on recognizing hand movements, facial expressions and body postures, and natural language processing, to convert signs into text. Recent approaches use Transformer architectures to model long-range dependencies via positional encoding. However, they lack accuracy in recognizing fine-grained, short-range temporal dependencies between gestures captured at high frame rates. Moreover, their high computational complexity leads to inefficient training. To mitigate these issues, we propose an Adaptive Transformer (ADAT), which incorporates components for enhanced feature extraction and adaptive feature weighting through a gating mechanism to emphasize contextually relevant features while reducing training overhead and maintaining translation accuracy. To evaluate ADAT, we introduce MedASL, the first public medical American Sign Language dataset. In sign-to-gloss-to-text experiments, ADAT outperforms the encoder-decoder transformer, improving BLEU-4 accuracy by 0.1% while reducing training time by 14.33% on PHOENIX14T and 3.24% on MedASL. In sign-to-text experiments, it improves accuracy by 8.7% and reduces training time by 2.8% on PHOENIX14T and achieves 4.7% higher accuracy and 7.17% faster training on MedASL. Compared to encoder-only and decoder-only baselines in sign-to-text, ADAT is at least 6.8% more accurate despite being up to 12.1% slower due to its dual-stream structure.
当前的手语机器翻译系统依赖于识别手部动作、面部表情和身体姿态,并通过自然语言处理将手势转换为文本。近期的方法采用了Transformer架构,利用位置编码来建模长距离依赖关系。然而,它们在捕捉高帧率下细微且短时间内的手势依赖关系方面缺乏准确性。此外,其计算复杂度很高,导致训练效率低下。 为了缓解这些问题,我们提出了一种自适应Transformer(ADAT),该模型通过引入增强特征提取和自适应特征加权的组件来解决这一问题,并通过门控机制强调上下文相关的特征,同时减少训练开销并保持翻译准确性。为了评估ADAT的效果,我们推出了MedASL,这是首个公开的医学美国手语数据集。 在手势到文字(经由词符)的实验中,在PHOENIX14T和MedASL上,ADAT的表现优于编码器-解码器Transformer模型,BLEU-4精度提升了0.1%,训练时间分别缩短了14.33%和3.24%。在直接手势到文字的实验中,在PHOENIX14T数据集上,ADAT提高了8.7%的准确率,并减少了2.8%的训练时间;而在MedASL上,其表现更是提升了4.7%的准确度并加快了7.17%的训练速度。 与手势到文字任务中的编码器和解码器基线相比,尽管ADAT由于其双流结构最多慢至多12.1%,但在准确性方面至少提高了6.8%。
https://arxiv.org/abs/2504.11942
Time-series anomaly detection, which detects errors and failures in a workflow, is one of the most important topics in real-world applications. The purpose of time-series anomaly detection is to reduce potential damages or losses. However, existing anomaly detection models detect anomalies through the error between the model output and the ground truth (observed) value, which makes them impractical. In this work, we present a \textit{proactive} approach for time-series anomaly detection based on a time-series forecasting model specialized for anomaly detection and a data-driven anomaly detection model. Our proactive approach establishes an anomaly threshold from training data with a data-driven anomaly detection model, and anomalies are subsequently detected by identifying predicted values that exceed the anomaly threshold. In addition, we extensively evaluated the model using four anomaly detection benchmarks and analyzed both predictable and unpredictable anomalies. We attached the source code as supplementary material.
时间序列异常检测,用于在工作流程中发现错误和故障,是实际应用中最重要的话题之一。时间序列异常检测的目的是减少潜在的危害或损失。然而,现有的异常检测模型通过模型输出与真实(观测)值之间的误差来检测异常,这使得它们不够实用。在这项工作中,我们提出了一种基于专为异常检测设计的时间序列预测模型和数据驱动型异常检测模型的**前瞻性**时间序列异常检测方法。我们的前瞻式方法利用数据驱动型异常检测模型从训练数据中建立一个异常阈值,并通过识别超过该异常阈值的预测值来检测异常。此外,我们使用四个异常检测基准对模型进行了广泛的评估,并分析了可预测和不可预测的异常情况。我们在补充材料中附上了源代码。
https://arxiv.org/abs/2504.11623
Timing of clinical events is central to characterization of patient trajectories, enabling analyses such as process tracing, forecasting, and causal reasoning. However, structured electronic health records capture few data elements critical to these tasks, while clinical reports lack temporal localization of events in structured form. We present a system that transforms case reports into textual time series-structured pairs of textual events and timestamps. We contrast manual and large language model (LLM) annotations (n=320 and n=390 respectively) of ten randomly-sampled PubMed open-access (PMOA) case reports (N=152,974) and assess inter-LLM agreement (n=3,103; N=93). We find that the LLM models have moderate event recall(O1-preview: 0.80) but high temporal concordance among identified events (O1-preview: 0.95). By establishing the task, annotation, and assessment systems, and by demonstrating high concordance, this work may serve as a benchmark for leveraging the PMOA corpus for temporal analytics.
临床事件的时间定位对于患者轨迹的描述至关重要,它支持过程追踪、预测分析和因果推理等研究。然而,结构化的电子健康记录很少包含这些任务所需的关键数据元素,而临床报告中缺乏以结构化形式表示的事件时间信息。我们提出了一种系统,该系统将病例报告转化为文本时间序列的形式,即将文本中的事件与其发生的时间戳对齐。 我们在10个随机选取的PubMed开放访问(PMOA)案例报告(共包含152,974份报告)上进行了手动标注和大型语言模型(LLM)标注之间的对比研究。具体而言,我们分别使用了320组和390组数据进行手动和LLM标注,并对两个LLM系统之间识别事件的一致性进行了评估(样本量为3,103;总报告数为93)。结果显示,LLM模型在事件召回率方面表现一般(O1-preview: 0.80),但在所识别事件的时间一致性上表现出色 (O1-preview: 0.95)。 通过确立任务、标注及评估系统,并展示出高度的一致性,这项工作可以作为利用PMOA语料库进行时间序列分析的基准参考。
https://arxiv.org/abs/2504.12350
While image captioning has gained significant attention, the potential of captioning time-series images, prevalent in areas like finance and healthcare, remains largely untapped. Existing time-series captioning methods typically offer generic, domain-agnostic descriptions of time-series shapes and struggle to adapt to new domains without substantial retraining. To address these limitations, we introduce TADACap, a retrieval-based framework to generate domain-aware captions for time-series images, capable of adapting to new domains without retraining. Building on TADACap, we propose a novel retrieval strategy that retrieves diverse image-caption pairs from a target domain database, namely TADACap-diverse. We benchmarked TADACap-diverse against state-of-the-art methods and ablation variants. TADACap-diverse demonstrates comparable semantic accuracy while requiring significantly less annotation effort.
尽管图像描述技术已经获得了广泛关注,但时序图像(如金融和医疗领域常用的)的描述潜力尚未被充分开发。现有的时序图像描述方法通常只能提供通用、不针对特定领域的时序形状描述,并且难以在没有大量再训练的情况下适应新的领域。为了解决这些问题,我们引入了TADACap,这是一个基于检索的框架,能够生成适合于新领域的时序图像描述而无需重新训练。在此基础上,我们提出了一种新颖的检索策略,可以从目标数据库中检索多样化的图-文对,称为TADACap-diverse。我们将TADACap-diverse与最新的方法和消融变体进行了基准测试。结果显示,TADACap-diverse在语义准确性方面表现出色,并且需要显著较少的标注工作量。
https://arxiv.org/abs/2504.11441
Time series are ubiquitous in domains such as energy forecasting, healthcare, and industry. Using AI systems, some tasks within these domains can be efficiently handled. Explainable AI (XAI) aims to increase the reliability of AI solutions by explaining model reasoning. For time series, many XAI methods provide point- or sequence-based attribution maps. These methods explain model reasoning in terms of low-level patterns. However, they do not capture high-level patterns that may also influence model reasoning. We propose a concept-based method to provide explanations in terms of these high-level patterns. In this paper, we present C-SHAP for time series, an approach which determines the contribution of concepts to a model outcome. We provide a general definition of C-SHAP and present an example implementation using time series decomposition. Additionally, we demonstrate the effectiveness of the methodology through a use case from the energy domain.
时间序列在能源预测、医疗保健和工业等各个领域中普遍存在。通过使用AI系统,这些领域的某些任务可以被高效处理。可解释的人工智能(XAI)旨在通过解释模型的推理过程来提高AI解决方案的可靠性。对于时间序列来说,许多XAI方法提供了基于点或序列的归因图,这些方法以低级模式的形式解释了模型的推理机制。然而,它们未能捕捉到可能也会影响模型推理的高级别模式。 我们提出了一种概念基础的方法,旨在通过提供关于这些高级别模式的解释来增强模型的透明度和可理解性。在这篇论文中,我们介绍了C-SHAP(Concept-based SHapley Additive exPlanations)时间序列方法,这种方法能够确定概念对模型输出贡献的程度。我们提供了C-SHAP的一般定义,并展示了一个使用时间序列分解的具体实现示例。此外,通过来自能源领域的实际案例研究,我们也展示了该方法的有效性。 简而言之,我们的工作为理解AI系统如何处理复杂的时间序列数据提供了一种新的视角和工具,从而提升了这些系统的可解释性和可靠性。
https://arxiv.org/abs/2504.11159
Spatial imbalances in crop type data pose significant challenges for accurate classification in remote sensing applications. Algorithms aiming at transferring knowledge from data-rich to data-scarce tasks have thus surged in popularity. However, despite their effectiveness in previous evaluations, their performance in challenging real-world applications is unclear and needs to be evaluated. This study benchmarks transfer learning and several meta-learning algorithms, including (First-Order) Model-Agnostic Meta-Learning ((FO)-MAML), Almost No Inner Loop (ANIL), and Task-Informed Meta-Learning (TIML), on the real-world EuroCropsML time series dataset, which combines farmer-reported crop data with Sentinel-2 satellite observations from Estonia, Latvia, and Portugal. Our findings indicate that MAML-based meta-learning algorithms achieve slightly higher accuracy compared to simpler transfer learning methods when applied to crop type classification tasks in Estonia after pre-training on data from Latvia. However, this improvement comes at the cost of increased computational demands and training time. Moreover, we find that the transfer of knowledge between geographically disparate regions, such as Estonia and Portugal, poses significant challenges to all investigated algorithms. These insights underscore the trade-offs between accuracy and computational resource requirements in selecting machine learning methods for real-world crop type classification tasks and highlight the difficulties of transferring knowledge between different regions of the Earth. To facilitate future research in this domain, we present the first comprehensive benchmark for evaluating transfer and meta-learning methods for crop type classification under real-world conditions. The corresponding code is publicly available at this https URL.
作物类型数据的空间不平衡给遥感应用中的准确分类带来了重大挑战。旨在将知识从数据丰富的任务转移到数据稀缺的任务的算法因此变得越来越受欢迎。然而,尽管它们在之前的评估中表现出有效性,但这些方法在具有挑战性的现实世界应用程序中的性能仍然不清楚,需要进行进一步评价。 本研究使用包含爱沙尼亚、拉脱维亚和葡萄牙农民报告的作物数据以及Sentinel-2卫星观测的EuroCropsML时间序列数据集,对迁移学习及几种元学习算法(包括模型无关的元学习((FO)-MAML)、几乎无内循环(ANIL) 和任务信息元学习(TIML))进行了基准测试。 我们的研究结果表明,在爱沙尼亚应用预训练于拉脱维亚数据上的作物类型分类任务时,基于MAML的元学习算法比简单的迁移学习方法表现出略高的准确性。然而,这种性能提升是以增加计算需求和训练时间为代价的。此外,我们发现地理上分离地区(如爱沙尼亚与葡萄牙)之间的知识转移对所有研究算法构成了重大挑战。 这些见解强调了在选择现实世界作物类型分类任务中的机器学习方法时,在准确性和计算资源要求之间存在的权衡,并突显了在全球不同区域间传输知识的困难。为了促进这一领域的未来研究,我们提供了第一个全面基准测试,用于评估在现实条件下进行作物类型分类的迁移和元学习方法的效果。相关的代码可在以下链接获取:[此URL]。
https://arxiv.org/abs/2504.11022
Foundation models have achieved remarkable success across diverse machine-learning domains through large-scale pretraining on large, diverse datasets. However, pretraining on such datasets introduces significant challenges due to substantial mismatches in data distributions, a problem particularly pronounced with time series data. In this paper, we tackle this issue by proposing a domain-aware adaptive normalization strategy within the Transformer architecture. Specifically, we replace the traditional LayerNorm with a prototype-guided dynamic normalization mechanism (ProtoNorm), where learned prototypes encapsulate distinct data distributions, and sample-to-prototype affinity determines the appropriate normalization layer. This mechanism effectively captures the heterogeneity of time series characteristics, aligning pretrained representations with downstream tasks. Through comprehensive empirical evaluation, we demonstrate that our method significantly outperforms conventional pretraining techniques across both classification and forecasting tasks, while effectively mitigating the adverse effects of distribution shifts during pretraining. Incorporating ProtoNorm is as simple as replacing a single line of code. Extensive experiments on diverse real-world time series benchmarks validate the robustness and generalizability of our approach, advancing the development of more versatile time series foundation models.
基础模型在大规模预训练下,在各种机器学习领域取得了显著的成功,这些预训练使用了庞大且多样化的数据集。然而,这样的预训练引入了重要的挑战,特别是由于数据分布的不匹配导致的问题,在时间序列数据中这个问题尤为突出。本文通过提出一种基于Transformer架构的领域感知自适应归一化策略来解决这一问题。具体来说,我们用原型引导动态归一化机制(ProtoNorm)替换了传统的LayerNorm,其中学习到的原型封装了不同的数据分布,并且样本与原型之间的相似度决定了合适的归一化层。这种机制有效地捕捉了时间序列特性的异质性,使预训练表示符合适应下游任务的要求。 通过全面的经验评估,我们展示了我们的方法在分类和预测任务上显著优于传统的预训练技术,同时有效缓解了预训练过程中分布变化带来的负面影响。将ProtoNorm整合到模型中仅需要替换一行代码即可完成。对多种现实世界的时间序列基准数据集进行的广泛实验验证了我们方法的强大性和泛化能力,从而推动了更具适应性的时间序列基础模型的发展。
https://arxiv.org/abs/2504.10900
Clinical case reports encode rich, temporal patient trajectories that are often underexploited by traditional machine learning methods relying on structured data. In this work, we introduce the forecasting problem from textual time series, where timestamped clinical findings--extracted via an LLM-assisted annotation pipeline--serve as the primary input for prediction. We systematically evaluate a diverse suite of models, including fine-tuned decoder-based large language models and encoder-based transformers, on tasks of event occurrence prediction, temporal ordering, and survival analysis. Our experiments reveal that encoder-based models consistently achieve higher F1 scores and superior temporal concordance for short- and long-horizon event forecasting, while fine-tuned masking approaches enhance ranking performance. In contrast, instruction-tuned decoder models demonstrate a relative advantage in survival analysis, especially in early prognosis settings. Our sensitivity analyses further demonstrate the importance of time ordering, which requires clinical time series construction, as compared to text ordering, the format of the text inputs that LLMs are classically trained on. This highlights the additional benefit that can be ascertained from time-ordered corpora, with implications for temporal tasks in the era of widespread LLM use.
临床病例报告编码了丰富的、随时间变化的患者轨迹,这些信息往往被依赖于结构化数据的传统机器学习方法所忽视。在这项工作中,我们引入了一种基于文本的时间序列预测问题,其中通过LLM(大语言模型)辅助注释管道提取的带有时间戳的临床发现作为主要输入进行预测。我们在一系列任务上系统地评估了各种模型的表现,包括经过微调的解码器基础的大规模语言模型和编码器基础的变压器模型,在这些任务中,我们需要预测事件的发生、确定时间顺序以及进行生存分析。 我们的实验结果显示,基于编码器的模型在短时和长时事件预测中的F1得分较高,并且具有更好的时间一致性。同时,微调掩码方法提高了排序性能。相比之下,指令微调解码器模型在生存分析中显示出相对优势,特别是在早期预后设置下更为明显。 此外,我们通过敏感性分析进一步证明了时间顺序的重要性,这要求构建临床时间序列,而不仅仅是文本输入的时间格式,后者是LLM传统训练所基于的格式。这强调了时间有序语料库带来的额外好处,在大语言模型广泛使用的时代对处理时序任务具有重要意义。
https://arxiv.org/abs/2504.10340
Multi-agents-based news-driven time series forecasting is considered as a potential paradigm shift in the era of large language models (LLMs). The challenge of this task lies in measuring the influences of different news events towards the fluctuations of time series. This requires agents to possess stronger abilities of innovative thinking and the identifying misleading logic. However, the existing multi-agent discussion framework has limited enhancement on time series prediction in terms of optimizing these two capabilities. Inspired by the role of competition in fostering innovation, this study embeds a competition mechanism within the multi-agent discussion to enhance agents' capability of generating innovative thoughts. Furthermore, to bolster the model's proficiency in identifying misleading information, we incorporate a fine-tuned small-scale LLM model within the reflective stage, offering auxiliary decision-making support. Experimental results confirm that the competition can boost agents' capacity for innovative thinking, which can significantly improve the performances of time series prediction. Similar to the findings of social science, the intensity of competition within this framework can influence the performances of agents, providing a new perspective for studying LLMs-based multi-agent systems.
基于多代理的新闻驱动时间序列预测被认为是大型语言模型(LLMs)时代的一种潜在范式转变。这项任务的挑战在于衡量不同新闻事件对时间序列波动的影响,这需要代理具有更强的创新思维能力和识别误导性逻辑的能力。然而,现有的多代理讨论框架在优化这两项能力以提升时间序列预测方面效果有限。受到竞争促进创新作用的启发,本研究在多代理讨论中嵌入了一个竞争机制来增强生成创新思想的能力。此外,为了提高模型识别虚假信息的能力,在反思阶段加入了经过微调的小规模LLM模型,提供辅助决策支持。实验结果证实了竞争可以提升代理的创新能力,从而显著改善时间序列预测性能。与社会科学发现类似,框架内的竞争强度会影响代理的表现,为基于LLMs的多代理系统研究提供了新的视角。
https://arxiv.org/abs/2504.10210
Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves \textit{efficient cross-modal alignment} with \textit{limited data pairs} through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE's pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.
标准的多模态自监督学习(SSL)算法在预训练过程中将跨模态同步视为隐含的监督标签,从而对多模态样本的数量和质量提出了较高的要求。这些限制显著地影响了物联网应用中感知智能的表现,因为时间序列信号的异质性和不可解释性导致了大量的单模态数据但高质量的多模态配对却非常稀缺。 本文提出了一种新的跨模态对齐框架——InfoMAE,它在自监督学习设置下通过促进预训练单模态表示的有效跨模态对齐来应对多模态配对效率挑战。InfoMAE利用一种新颖的信息理论启发式公式,在有限的数据配对情况下实现了有效的跨模态对齐,并且同时解决了分布层面和实例层面的对齐问题。 在两个实际物联网应用中进行了广泛的实验,以评估InfoMAE在将预训练单模态模型整合为统一多模态模型时的匹配效率。结果表明,与现有方法相比,InfoMAE能够通过显著提高多模态配对效率使下游多模态任务性能提升超过60%,同时平均提升了22%的单模态任务准确率。
https://arxiv.org/abs/2504.09707
When applying pre-trained large language models (LLMs) to address anomaly detection tasks, the multivariate time series (MTS) modality of anomaly detection does not align with the text modality of LLMs. Existing methods simply transform the MTS data into multiple univariate time series sequences, which can cause many problems. This paper introduces MADLLM, a novel multivariate anomaly detection method via pre-trained LLMs. We design a new triple encoding technique to align the MTS modality with the text modality of LLMs. Specifically, this technique integrates the traditional patch embedding method with two novel embedding approaches: Skip Embedding, which alters the order of patch processing in traditional methods to help LLMs retain knowledge of previous features, and Feature Embedding, which leverages contrastive learning to allow the model to better understand the correlations between different features. Experimental results demonstrate that our method outperforms state-of-the-art methods in various public anomaly detection datasets.
在将预训练的大规模语言模型(LLMs)应用于异常检测任务时,多变量时间序列(MTS)的模态与LLM的文本模态并不匹配。现有方法简单地将MTS数据转换为多个单变量时间序列序列,这会导致许多问题。本文介绍了一种新的基于预训练LLMs的多变量异常检测方法——MADLLM。我们设计了一种新的三重编码技术,以使MTS模态与LLM的文本模态对齐。具体而言,该技术结合了传统的patch嵌入方法以及两种新型嵌入方法:Skip Embedding(跳过嵌入),它改变了传统方法中patch处理的顺序,帮助LLMs保留先前特征的知识;Feature Embedding(特征嵌入),利用对比学习使模型更好地理解不同特征之间的相关性。实验结果表明,在各种公开的异常检测数据集中,我们的方法优于最先进的方法。
https://arxiv.org/abs/2504.09504
Ensuring maritime safety and optimizing traffic management in increasingly crowded and complex waterways require effective waterway monitoring. However, current methods struggle with challenges arising from multimodal data, such as dimensional disparities, mismatched target counts, vessel scale variations, occlusions, and asynchronous data streams from systems like the automatic identification system (AIS) and closed-circuit television (CCTV). Traditional multi-target association methods often struggle with these complexities, particularly in densely trafficked waterways. To overcome these issues, we propose a graph learning-driven multi-vessel association (GMvA) method tailored for maritime multimodal data fusion. By integrating AIS and CCTV data, GMvA leverages time series learning and graph neural networks to capture the spatiotemporal features of vessel trajectories effectively. To enhance feature representation, the proposed method incorporates temporal graph attention and spatiotemporal attention, effectively capturing both local and global vessel interactions. Furthermore, a multi-layer perceptron-based uncertainty fusion module computes robust similarity scores, and the Hungarian algorithm is adopted to ensure globally consistent and accurate target matching. Extensive experiments on real-world maritime datasets confirm that GMvA delivers superior accuracy and robustness in multi-target association, outperforming existing methods even in challenging scenarios with high vessel density and incomplete or unevenly distributed AIS and CCTV data.
确保海上安全和优化日益拥挤且复杂的水域交通管理需要有效的水道监控。然而,目前的方法在处理多模态数据(如尺寸差异、目标计数不匹配、船舶规模变化、遮挡以及来自自动识别系统(AIS)和闭路电视(CCTV)等系统的不同步数据流)带来的挑战时显得力不从心。传统的目标关联方法往往难以应对这些复杂性,尤其是在交通密集的水域中更是如此。 为了克服这些问题,我们提出了一种基于图学习的多船体关联(GMvA)方法,专门用于处理海事多模态数据融合。通过整合AIS和CCTV的数据,GMvA利用时间序列学习和图神经网络有效捕捉船舶轨迹的空间时间和时间特征。为增强特征表示,所提出的方法结合了时序图注意力机制和时空注意力机制,能够有效捕获局部和全局的船舶交互。 此外,基于多层感知器(MLP)的不确定性融合模块计算稳健的相似性得分,并采用匈牙利算法确保整体一致且准确的目标匹配。在现实世界海事数据集上进行的广泛实验确认了GMvA在多目标关联方面提供了卓越的精度和鲁棒性,甚至在船舶密度高、AIS和CCTV数据不完整或分布不均等具有挑战性的场景中也优于现有方法。 通过这些技术改进,我们能够更有效地管理海上交通,并提高海上航行的安全性和效率。
https://arxiv.org/abs/2504.09197
Long sequence prediction is a key challenge in time series forecasting. While Mamba-based models have shown strong performance due to their sequence selection capabilities, they still struggle with insufficient focus on critical time steps and incomplete noise suppression, caused by limited selective abilities. To address this, we introduce Repetitive Contrastive Learning (RCL), a token-level contrastive pretraining framework aimed at enhancing Mamba's selective capabilities. RCL pretrains a single Mamba block to strengthen its selective abilities and then transfers these pretrained parameters to initialize Mamba blocks in various backbone models, improving their temporal prediction performance. RCL uses sequence augmentation with Gaussian noise and applies inter-sequence and intra-sequence contrastive learning to help the Mamba module prioritize information-rich time steps while ignoring noisy ones. Extensive experiments show that RCL consistently boosts the performance of backbone models, surpassing existing methods and achieving state-of-the-art results. Additionally, we propose two metrics to quantify Mamba's selective capabilities, providing theoretical, qualitative, and quantitative evidence for the improvements brought by RCL.
长时间序列预测是时间序列预测中的一个关键挑战。基于Mamba模型由于其优秀的序列选择能力而展现出强大的性能,但它们仍然面临着对关键时刻点关注度不足和噪声抑制不充分的问题,这些问题归因于其有限的选择性能力。为了解决这一问题,我们引入了重复对比学习(Repetitive Contrastive Learning, RCL),这是一种以提升Mamba模型选择性能力为目标的标记级别对比预训练框架。RCL首先对单个Mamba模块进行预训练来强化它的选择性能力,然后将这些经过预训练的参数转移到各种主干模型中的Mamba块初始化中,以此提高它们的时间序列预测性能。 RCL使用带有高斯噪声的数据增强方法,并应用跨序列和内序列对比学习策略,以帮助Mamba模块优先处理信息丰富的时刻点并忽略噪音。广泛实验证明,RCL能够持续提升主干模型的性能,超越现有方法,并取得最先进的成果。此外,我们还提出两种度量标准来量化Mamba的选择性能力,为RCL带来的改进提供了理论、定性和定量方面的证据。
https://arxiv.org/abs/2504.09185
Street-view images offer unique advantages for disaster damage estimation as they capture impacts from a visual perspective and provide detailed, on-the-ground insights. Despite several investigations attempting to analyze street-view images for damage estimation, they mainly focus on post-disaster images. The potential of time-series street-view images remains underexplored. Pre-disaster images provide valuable benchmarks for accurate damage estimations at building and street levels. These images could aid annotators in objectively labeling post-disaster impacts, improving the reliability of labeled data sets for model training, and potentially enhancing the model performance in damage evaluation. The goal of this study is to estimate hyperlocal, on-the-ground disaster damages using bi-temporal street-view images and advanced pre-trained vision models. Street-view images before and after 2024 Hurricane Milton in Horseshoe Beach, Florida, were collected for experiments. The objectives are: (1) to assess the performance gains of incorporating pre-disaster street-view images as a no-damage category in fine-tuning pre-trained models, including Swin Transformer and ConvNeXt, for damage level classification; (2) to design and evaluate a dual-channel algorithm that reads pair-wise pre- and post-disaster street-view images for hyperlocal damage assessment. The results indicate that incorporating pre-disaster street-view images and employing a dual-channel processing framework can significantly enhance damage assessment accuracy. The accuracy improves from 66.14% with the Swin Transformer baseline to 77.11% with the dual-channel Feature-Fusion ConvNeXt model. This research enables rapid, operational damage assessments at hyperlocal spatial resolutions, providing valuable insights to support effective decision-making in disaster management and resilience planning.
街道视图图像在灾害损失评估中具有独特的优势,因为它们能够从视觉角度捕捉到影响,并提供详细的地面情况洞察。尽管已经有不少研究试图通过分析街景图片来进行灾害损失评估,但大多数研究主要集中在灾后的图片上。而时间序列的街景图片(即灾前和灾后对比图)的巨大潜力却尚未得到充分探索。 灾前图像为准确估计建筑和街道层面的损害提供了宝贵的基准,并能帮助标注人员客观地标记出灾后的破坏情况,从而提高训练模型所需的数据集的质量。这有望增强模型在灾害评估中的表现。本研究旨在使用双时间街景图(即2024年飓风米尔顿在佛罗里达州Horseshoe Beach前后拍摄的图像)和先进的预训练视觉模型来估计超局部、地面层面的灾难损失。 具体目标是: 1. 通过将无损灾害类别作为预训练模型微调的一部分,评估利用灾前街景图所带来的性能提升,这包括Swin Transformer 和 ConvNeXt 模型在灾害等级分类中的表现。 2. 设计并评估一种双通道算法,该算法可以读取成对的灾害前后街景图像,并用于超局部损害评估。 实验结果显示,在将预灾街景图片纳入考量以及采用双通道处理框架之后,损失评估的准确性显著提高。具体来说,使用Swin Transformer 基线模型时,准确率为66.14%,而采用双通道特征融合ConvNeXt 模型后,则提升到了77.11%。 这项研究为灾难管理和抗灾规划中的有效决策提供了快速、操作化的超局部空间分辨率损害评估的重要见解。
https://arxiv.org/abs/2504.09066
Clinical case reports and discharge summaries may be the most complete and accurate summarization of patient encounters, yet they are finalized, i.e., timestamped after the encounter. Complementary data structured streams become available sooner but suffer from incompleteness. To train models and algorithms on more complete and temporally fine-grained data, we construct a pipeline to phenotype, extract, and annotate time-localized findings within case reports using large language models. We apply our pipeline to generate an open-access textual time series corpus for Sepsis-3 comprising 2,139 case reports from the Pubmed-Open Access (PMOA) Subset. To validate our system, we apply it on PMOA and timeline annotations from I2B2/MIMIC-IV and compare the results to physician-expert annotations. We show high recovery rates of clinical findings (event match rates: O1-preview--0.755, Llama 3.3 70B Instruct--0.753) and strong temporal ordering (concordance: O1-preview--0.932, Llama 3.3 70B Instruct--0.932). Our work characterizes the ability of LLMs to time-localize clinical findings in text, illustrating the limitations of LLM use for temporal reconstruction and providing several potential avenues of improvement via multimodal integration.
临床病例报告和出院总结可能是对患者就诊情况最完整且准确的记录,但它们是在会诊之后完成并标注时间戳。相比之下,补充的数据流虽然可以更早地提供信息,但却可能存在不完整性的问题。为了使用更加完整且具有更高时间分辨率的数据来训练模型和算法,我们构建了一个利用大规模语言模型对病例报告进行表型分析、提取以及标注时间相关发现的流水线。我们将此流程应用于生成一个开放访问文本时间序列语料库,该语料库包含来自PubMed开放存取子集(PMOA)中的2,139份塞斯普斯-3病例报告。 为了验证我们的系统,我们在PMOA和I2B2/MIMIC-IV的时间线注释上应用了它,并将结果与医生专家的标注进行了比较。我们展示了临床发现的高度恢复率(事件匹配率:O1-preview为0.755,Llama 3.3 70B Instruct为0.753)和强大的时间顺序准确性(一致性:O1-preview为0.932,Llama 3.3 70B Instruct为0.932)。我们的工作描述了大规模语言模型在文本中对临床发现进行时间定位的能力,并揭示了使用LLM进行时间重建的局限性,同时提供了通过多模态集成来改进的方法。
https://arxiv.org/abs/2504.12326
In this paper, we introduce a new approach to multivariate forecasting cryptocurrency prices using a hybrid contextual model combining exponential smoothing (ES) and recurrent neural network (RNN). The model consists of two tracks: the context track and the main track. The context track provides additional information to the main track, extracted from representative series. This information as well as information extracted from exogenous variables is dynamically adjusted to the individual series forecasted by the main track. The RNN stacked architecture with hierarchical dilations, incorporating recently developed attentive dilated recurrent cells, allows the model to capture short and long-term dependencies across time series and dynamically weight input information. The model generates both point daily forecasts and predictive intervals for one-day, one-week and four-week horizons. We apply our model to forecast prices of 15 cryptocurrencies based on 17 input variables and compare its performance with that of comparative models, including both statistical and ML ones.
在这篇论文中,我们提出了一种新的方法,用于使用结合指数平滑(ES)和递归神经网络(RNN)的混合上下文模型进行多变量加密货币价格预测。该模型由两个轨道组成:上下文轨道和主轨道。上下文轨道为主轨道提供额外信息,这些信息是从代表性时间序列中提取出来的。此外,从外部变量中提取的信息也会根据主轨道预测的时间序列动态调整。通过引入具有分层膨胀的堆叠RNN架构,并结合最近开发的关注力膨胀递归细胞,该模型能够捕捉跨时间序列的短期和长期依赖关系,并且可以动态地对输入信息进行加权。模型不仅生成单点的日预测值,还生成为期一天、一周和四周的预测区间。我们将此模型应用于基于17个输入变量来预测15种加密货币的价格,并将其性能与包括统计学方法和机器学习方法在内的比较模型进行了对比。
https://arxiv.org/abs/2504.08947
In this paper, we investigate meta-learning for combining forecasts generated by models of different types. While typical approaches for combining forecasts involve simple averaging, machine learning techniques enable more sophisticated methods of combining through meta-learning, leading to improved forecasting accuracy. We use linear regression, $k$-nearest neighbors, multilayer perceptron, random forest, and long short-term memory as meta-learners. We define global and local meta-learning variants for time series with complex seasonality and compare meta-learners on multiple forecasting problems, demonstrating their superior performance compared to simple averaging.
在这篇论文中,我们研究了元学习在结合不同类型模型生成的预测结果中的应用。尽管传统的组合预测方法通常采用简单的平均法,但机器学习技术通过元学习提供了更为复杂的组合方法,从而提高了预测准确性。我们使用线性回归、$k$-最近邻算法、多层感知机、随机森林和长短期记忆网络作为元学习器。对于具有复杂季节性的时间序列,我们定义了全局和局部元学习变体,并在多个预测问题上比较不同元学习器的性能,证明它们相对于简单平均法具有更优越的表现。
https://arxiv.org/abs/2504.08940
Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments, enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 diagnostic labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.
基于深度学习的心电图(ECG)分类展示了出色的表现,但因缺乏透明和忠实的解释而使得临床应用受到了阻碍。事后方法如敏感性图可能无法反映模型的真实决策过程。基于原型的推理通过将决策与实际心电图片段的相似度联系起来提供了更加透明的选择,并且能够提供忠实在个案基础上的解释。我们引入了ProtoECGNet,这是一种用于可解释、多标签心电图分类的基于原型的深度学习模型。 ProtoECGNet采用了一种结构化的多分支架构,这种架构反映了临床解读工作流程:它结合了一个1D卷积神经网络(CNN)和全局原型来进行节律分类;一个2D CNN以及时间局部化原型来支持形态学推理;还有一个2D CNN和全球原型用于弥散异常的识别。每个分支都使用专为多标签学习设计的原型损失进行训练,这种损失结合了聚类、分离、多样性,并且引入了一种新型对比损失,该损失鼓励不同类别之间的适当分离同时允许常见共现诊断的聚类。 我们在PTB-XL数据集的所有71个诊断标签上对ProtoECGNet进行了评估,结果显示其相对于最新的黑盒模型在性能方面具有竞争力,同时还提供了结构化的、基于案例的解释。为了评估原型的质量,我们进行了一项由临床医生参与的结构化审查,发现在最终模型中投影出来的原型被评价为有代表性且清晰。 ProtoECGNet展示了原型学习可以有效地扩展到复杂的多标签时间序列分类任务,并为临床决策支持提供一种实现透明和值得信赖深度学习模型的实际路径。
https://arxiv.org/abs/2504.08713