Paper Reading AI Learner

Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective

2024-04-24 12:41:04
Vaisakh Shaj

Abstract

Machines that can replicate human intelligence with type 2 reasoning capabilities should be able to reason at multiple levels of spatio-temporal abstractions and scales using internal world models. Devising formalisms to develop such internal world models, which accurately reflect the causal hierarchies inherent in the dynamics of the real world, is a critical research challenge in the domains of artificial intelligence and machine learning. This thesis identifies several limitations with the prevalent use of state space models (SSMs) as internal world models and propose two new probabilistic formalisms namely Hidden-Parameter SSMs and Multi-Time Scale SSMs to address these drawbacks. The structure of graphical models in both formalisms facilitates scalable exact probabilistic inference using belief propagation, as well as end-to-end learning via backpropagation through time. This approach permits the development of scalable, adaptive hierarchical world models capable of representing nonstationary dynamics across multiple temporal abstractions and scales. Moreover, these probabilistic formalisms integrate the concept of uncertainty in world states, thus improving the system's capacity to emulate the stochastic nature of the real world and quantify the confidence in its predictions. The thesis also discuss how these formalisms are in line with related neuroscience literature on Bayesian brain hypothesis and predicitive processing. Our experiments on various real and simulated robots demonstrate that our formalisms can match and in many cases exceed the performance of contemporary transformer variants in making long-range future predictions. We conclude the thesis by reflecting on the limitations of our current models and suggesting directions for future research.

Abstract (translated)

具有类型2推理能力的机器应该能够使用内部世界模型在多个层次的时空抽象思维中进行推理。在人工智能和机器学习领域为开发这种内部世界模型,准确反映真实世界动态的因果层次结构,是一个关键的研究挑战。本文提出了一种新概率形式,即隐藏参数SSM和多时间尺度SSM,用于解决使用普遍的状态空间模型(SSMs)作为内部世界模型的限制。这两种形式图模型的结构有助于使用信念传播进行可扩展的完全概率推理以及通过反向传播进行端到端学习。这种方法允许开发可扩展、自适应的层次世界模型,能够表示多个时间抽象层次和非平稳动态。此外,这些概率形式还融入了世界状态的不确定性概念,从而提高了系统模拟真实世界非随机性的能力,并估计其预测的置信度。本文还讨论了这些形式与相关神经科学文献中贝叶斯大脑假设和预测处理的关系。我们对各种真实和模拟机器人的实验证明,我们的形式可以与当代Transformer变体相匹配,并在许多情况下超过其性能。我们结论,论文通过对当前模型的局限性的反思,提出了未来研究的方向。

URL

https://arxiv.org/abs/2404.16078

PDF

https://arxiv.org/pdf/2404.16078.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot