Paper Reading AI Learner

Controlling Steering with Energy-Based Models

2023-01-28 18:27:09
Mykyta Baliesnyi, Ardi Tampuu, Tambet Matiisen

Abstract

So-called implicit behavioral cloning with energy-based models has shown promising results in robotic manipulation tasks. We tested if the method's advantages carry on to controlling the steering of a real self-driving car with an end-to-end driving model. We performed an extensive comparison of the implicit behavioral cloning approach with explicit baseline approaches, all sharing the same neural network backbone architecture. Baseline explicit models were trained with regression (MAE) loss, classification loss (softmax and cross-entropy on a discretization), or as mixture density networks (MDN). While models using the energy-based formulation performed comparably to baseline approaches in terms of safety driver interventions, they had a higher whiteness measure, indicating higher jerk. To alleviate this, we show two methods that can be used to improve the smoothness of steering. We confirmed that energy-based models handle multimodalities slightly better than simple regression, but this did not translate to significantly better driving ability. We argue that the steering-only road-following task has too few multimodalities to benefit from energy-based models. This shows that applying implicit behavioral cloning to real-world tasks can be challenging, and further investigation is needed to bring out the theoretical advantages of energy-based models.

Abstract (translated)

所谓的基于能源模型的隐含行为复制在机器人操纵任务中取得了令人瞩目的结果。我们测试了这种方法是否能够通过end-to-end驾驶模型来控制真实的自主汽车的方向舵。我们进行了广泛的比较,将隐含行为复制方法和 explicit baseline方法进行了对比,这些方法都共享相同的神经网络主干架构。 explicit baseline方法通过回归(MAE)损失、分类损失(在离散化时softmax和交叉熵)或混合密度网络(MDN)训练。尽管使用基于能源的 formulation 的模型在安全性 driver 干预方面与 explicit baseline方法表现相似,但它们的亮度测量更高,表明更高的抖动。为了减轻这种情况,我们展示了两种方法,这些方法可以用来改善方向舵的平滑性。我们证实了基于能源模型处理多模式比简单的回归更好,但这并不意味着更好的驾驶能力。我们指出,只有方向舵的唯一跟随任务缺乏多模式,因此基于能源模型的方法无法从中获得好处。这表明将隐含行为复制方法应用于实际任务可能会面临挑战,需要更多的研究来揭示基于能源模型的理论基础优势。

URL

https://arxiv.org/abs/2301.12264

PDF

https://arxiv.org/pdf/2301.12264.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot