Paper Reading AI Learner

GMG: A Video Prediction Method Based on Global Focus and Motion Guided

2025-03-14 11:06:49
Yuhao Du, Hui Liu, Haoxiang Peng, Xinyuan Chen, Chenrong Wu, Jiankai Zhang

Abstract

Recent years, weather forecasting has gained significant attention. However, accurately predicting weather remains a challenge due to the rapid variability of meteorological data and potential teleconnections. Current spatiotemporal forecasting models primarily rely on convolution operations or sliding windows for feature extraction. These methods are limited by the size of the convolutional kernel or sliding window, making it difficult to capture and identify potential teleconnection features in meteorological data. Additionally, weather data often involve non-rigid bodies, whose motion processes are accompanied by unpredictable deformations, further complicating the forecasting task. In this paper, we propose the GMG model to address these two core challenges. The Global Focus Module, a key component of our model, enhances the global receptive field, while the Motion Guided Module adapts to the growth or dissipation processes of non-rigid bodies. Through extensive evaluations, our method demonstrates competitive performance across various complex tasks, providing a novel approach to improving the predictive accuracy of complex spatiotemporal data.

Abstract (translated)

近年来,天气预报受到了越来越多的关注。然而,由于气象数据的快速变化和潜在的远程联系(teleconnections),准确预测天气仍然是一个挑战。目前的空间时间预测模型主要依赖于卷积操作或滑动窗口来进行特征提取。这些方法受到卷积核大小或滑动窗口大小的限制,难以捕捉并识别气象数据中的潜在远程联系特征。此外,天气数据通常涉及非刚体物体,其运动过程伴随着不可预知的变形,进一步增加了预测任务的复杂性。 在本文中,我们提出了GMG模型来解决上述两个核心挑战。该模型的关键组成部分是全局关注模块(Global Focus Module),它增强了全局感受野;另一个重要部分是运动引导模块(Motion Guided Module),它可以适应非刚体物体的成长或消散过程。通过广泛的评估,我们的方法在各种复杂任务中展示了竞争性的性能,为提高复杂空间时间数据的预测准确性提供了一种新颖的方法。

URL

https://arxiv.org/abs/2503.11297

PDF

https://arxiv.org/pdf/2503.11297.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot