Paper Reading AI Learner

CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

2024-05-06 11:30:55
Tao Han, zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai

Abstract

The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI-based meteorological research. To mitigate this issue, we introduce an efficient neural codec, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data to significantly reduce data storage cost, making AI-based meteorological research portable to researchers. Our approach diverges from recent complex neural codecs by utilizing a low-complexity Auto-Encoder transformer. This encoder produces a quantized latent representation through variance inference, which reparameterizes the latent space as a Gaussian distribution. This method improves the estimation of distributions for cross-entropy coding. Extensive experiments demonstrate that our VAEformer outperforms existing state-of-the-art compression methods in the context of climate data. By applying our VAEformer, we compressed the most popular ERA5 climate dataset (226 TB) into a new dataset, CRA5 (0.7 TB). This translates to a compression ratio of over 300 while retaining the dataset's utility for accurate scientific analysis. Further, downstream experiments show that global weather forecasting models trained on the compact CRA5 dataset achieve forecasting accuracy comparable to the model trained on the original dataset. Code, the CRA5 dataset, and the pre-trained model are available at this https URL.

Abstract (translated)

数据驱动的天气预报模型的出现已经显著提高了预测能力。然而,与数据存储和传输相关的巨额成本使得数据提供商和用户面临重大挑战,限制了受AI驱动气象研究限制的研究人员参与。为了减轻这个问题,我们引入了高效的神经编码器,Variational Autoencoder Transformer(VAEformer),用于对气候数据的极端压缩,显著减少了数据存储成本,使基于AI的气象研究对研究人员来说具有便携性。我们的方法与最近复杂的神经编码器有所不同,因为它利用了低复杂度的自编码器变换器。这个编码器通过离散变量推断产生量化 latent 表示,重新参数化 latent 空间为高斯分布。这种方法改善了交叉熵编码的分布估计。大量的实验证明,在气候数据背景下,我们的VAEformer超越了现有最先进的压缩方法。通过应用我们的VAEformer,我们将最流行的ERA5气候数据集(226 TB)压缩到了新的数据集CRA5(0.7 TB)。这导致压缩比超过300,同时保留数据的准确科学分析用途。此外,下游实验证明,在紧凑的CRA5数据集上训练的全天气报预测模型具有与原数据集训练的模型相当的预测准确性。代码、CRA5数据集和预训练模型都可以在這個URL https:// URL上找到。

URL

https://arxiv.org/abs/2405.03376

PDF

https://arxiv.org/pdf/2405.03376.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot