Paper Reading AI Learner

InstructTime++: Time Series Classification with Multimodal Language Modeling via Implicit Feature Enhancement

2026-01-21 13:12:23
Mingyue Cheng, Xiaoyu Tao, Huajian Zhang, Qi Liu, Enhong Chen

Abstract

Most existing time series classification methods adopt a discriminative paradigm that maps input sequences directly to one-hot encoded class labels. While effective, this paradigm struggles to incorporate contextual features and fails to capture semantic relationships among classes. To address these limitations, we propose InstructTime, a novel framework that reformulates time series classification as a multimodal generative task. Specifically, continuous numerical sequences, contextual textual features, and task instructions are treated as multimodal inputs, while class labels are generated as textual outputs by tuned language models. To bridge the modality gap, InstructTime introduces a time series discretization module that converts continuous sequences into discrete temporal tokens, together with an alignment projection layer and a generative self-supervised pre-training strategy to enhance cross-modal representation alignment. Building upon this framework, we further propose InstructTime++, which extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models. InstructTime++ leverages specialized toolkits to mine informative implicit patterns from raw time series and contextual inputs, including statistical feature extraction and vision-language-based image captioning, and translates them into textual descriptions for seamless integration. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of InstructTime++.

Abstract (translated)

大多数现有的时间序列分类方法采用了一种判别式范式,该范式直接将输入序列映射为一个由one-hot编码表示的类别标签。尽管这种方法有效,但它难以整合上下文特征,并且无法捕捉类别的语义关系。为了克服这些限制,我们提出了InstructTime,这是一个新的框架,它重新定义时间序列分类为一个多模态生成任务。具体来说,在该框架中,连续数值序列、上下文文本特征和任务指令被视为多模态输入,而类别标签则通过调整过的语言模型生成为文本输出。 为了弥合不同模态之间的鸿沟,InstructTime引入了一个时间序列离散化模块,它将连续的时间序列转换成离散的时间标记。此外,还包括一个对齐投影层和一种增强跨模态表示对齐的生成自监督预训练策略。 在此框架的基础上,我们进一步提出了InstructTime++,该方法通过引入隐式特征建模来扩展InstructTime以弥补语言模型有限的归纳偏差。InstructTime++利用专门工具包从原始时间序列和上下文输入中挖掘出有用的隐式模式,包括统计特征提取以及基于视觉-语言的时间序列描述生成,并将这些模式转化为文本描述进行无缝集成。 在多个基准数据集上的广泛实验表明了InstructTime++的优越性能。

URL

https://arxiv.org/abs/2601.14968

PDF

https://arxiv.org/pdf/2601.14968.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot