Paper Reading AI Learner

Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids

2026-01-12 16:50:50
Shaoting Zhu, Ziwen Zhuang, Mengjie Zhao, Kun-Ying Lee, Hang Zhao

Abstract

Achieving robust humanoid hiking in complex, unstructured environments requires transitioning from reactive proprioception to proactive perception. However, integrating exteroception remains a significant challenge: mapping-based methods suffer from state estimation drift; for instance, LiDAR-based methods do not handle torso jitter well. Existing end-to-end approaches often struggle with scalability and training complexity; specifically, some previous works using virtual obstacles are implemented case-by-case. In this work, we present \textit{Hiking in the Wild}, a scalable, end-to-end parkour perceptive framework designed for robust humanoid hiking. To ensure safety and training stability, we introduce two key mechanisms: a foothold safety mechanism combining scalable \textit{Terrain Edge Detection} with \textit{Foot Volume Points} to prevent catastrophic slippage on edges, and a \textit{Flat Patch Sampling} strategy that mitigates reward hacking by generating feasible navigation targets. Our approach utilizes a single-stage reinforcement learning scheme, mapping raw depth inputs and proprioception directly to joint actions, without relying on external state estimation. Extensive field experiments on a full-size humanoid demonstrate that our policy enables robust traversal of complex terrains at speeds up to 2.5 m/s. The training and deployment code is open-sourced to facilitate reproducible research and deployment on real robots with minimal hardware modifications.

Abstract (translated)

在复杂且未结构化的环境中实现稳健的人形机器人徒步需要从反应性的本体感觉过渡到前瞻性的感知。然而,整合外感受(exteroception)仍然是一个重大挑战:基于地图的方法会遭受状态估计漂移的困扰;例如,LiDAR方法难以处理躯干抖动的问题。现有的端到端方法通常面临可扩展性和训练复杂度的问题;一些以前使用虚拟障碍物的工作往往是特定案例实施的。在本文中,我们提出了“野外徒步”(Hiking in the Wild),这是一个针对稳健人形机器人徒步设计的、具有可扩展性的端到端跑酷感知框架。为了确保安全和训练稳定性,我们引入了两种关键机制:一种是结合可扩展性“地形边缘检测”与“足部体积点”的踏脚点安全性机制,防止在边缘发生灾难性滑动;另一种是一个通过生成可行导航目标来缓解奖励操纵的“平面块采样”策略。我们的方法采用单阶段强化学习方案,直接将原始深度输入和本体感觉映射到关节动作上,并不依赖外部状态估计。广泛的实地实验表明,在一个全尺寸的人形机器人上使用该策略可以以高达2.5米/秒的速度稳健地穿越复杂地形。训练和部署代码已开源,便于可重复的研究以及在实际机器人上的最小硬件修改下进行部署。

URL

https://arxiv.org/abs/2601.07718

PDF

https://arxiv.org/pdf/2601.07718.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot