Abstract
The rapid evolution of satellite-borne Earth Observation (EO) systems has revolutionized terrestrial monitoring, yielding petabyte-scale archives. However, the immense computational and storage requirements for global-scale analysis often preclude widespread use, hindering planetary-scale studies. To address these barriers, we present Embedded Seamless Data (ESD), an ultra-lightweight, 30-m global Earth embedding database spanning the 25-year period from 2000 to 2024. By transforming high-dimensional, multi-sensor observations from the Landsat series (5, 7, 8, and 9) and MODIS Terra into information-dense, quantized latent vectors, ESD distills essential geophysical and semantic features into a unified latent space. Utilizing the ESDNet architecture and Finite Scalar Quantization (FSQ), the dataset achieves a transformative ~340-fold reduction in data volume compared to raw archives. This compression allows the entire global land surface for a single year to be encapsulated within approximately 2.4 TB, enabling decadal-scale global analysis on standard local workstations. Rigorous validation demonstrates high reconstructive fidelity (MAE: 0.0130; RMSE: 0.0179; CC: 0.8543). By condensing the annual phenological cycle into 12 temporal steps, the embeddings provide inherent denoising and a semantically organized space that outperforms raw reflectance in land-cover classification, achieving 79.74% accuracy (vs. 76.92% for raw fusion). With robust few-shot learning capabilities and longitudinal consistency, ESD provides a versatile foundation for democratizing planetary-scale research and advancing next-generation geospatial artificial intelligence.
Abstract (translated)
地球观测(EO)卫星系统的快速演化已经彻底改变了陆地监测,产生了规模达PB级的数据存档。然而,全球范围分析所需的巨大计算和存储需求往往限制了其广泛应用,阻碍了行星尺度的研究进展。为了解决这些问题,我们提出了嵌入式无缝数据(ESD),这是一个超轻量级的、30米分辨率的全球地球嵌入数据库,涵盖了从2000年到2024年的25年期间的数据。 通过将来自陆地卫星系列(包括5号、7号、8号和9号)以及MODIS Terra传感器的高维度多传感器观测数据转换为信息密集且量化的潜在向量,ESD能够提炼出重要的地球物理及语义特征,并将其整合到统一的潜在空间中。利用ESDNet架构和有限标量量化(FSQ),该数据库实现了与原始存档相比约340倍的数据体积压缩。这一压缩使得全球陆地表面一年内的数据可以封装在大约2.4 TB的空间内,从而可以在标准本地工作站上执行跨十年的全球分析。 严格的验证表明,重建精度极高(均方误差:0.0130;根均方误差:0.0179;皮尔森相关系数:0.8543)。通过将年度物候周期简化为12个时间步骤,这些嵌入式数据提供了固有的降噪效果,并且在语义组织方面超越了原始反射率,在土地覆盖分类中达到了79.74%的准确度(而基于原始融合的数据仅为76.92%)。 凭借强大的少样本学习能力和纵向一致性,ESD为普及行星尺度的研究和推进下一代地理空间人工智能提供了灵活的基础。
URL
https://arxiv.org/abs/2601.11183