Paper Reading AI Learner

Alphazzle: Jigsaw Puzzle Solver with Deep Monte-Carlo Tree Search

2023-02-01 11:41:21
Marie-Morgane Paumard, Hedi Tabia, David Picard

Abstract

Solving jigsaw puzzles requires to grasp the visual features of a sequence of patches and to explore efficiently a solution space that grows exponentially with the sequence length. Therefore, visual deep reinforcement learning (DRL) should answer this problem more efficiently than optimization solvers coupled with neural networks. Based on this assumption, we introduce Alphazzle, a reassembly algorithm based on single-player Monte Carlo Tree Search (MCTS). A major difference with DRL algorithms lies in the unavailability of game reward for MCTS, and we show how to estimate it from the visual input with neural networks. This constraint is induced by the puzzle-solving task and dramatically adds to the task complexity (and interest!). We perform an in-deep ablation study that shows the importance of MCTS and the neural networks working together. We achieve excellent results and get exciting insights into the combination of DRL and visual feature learning.

Abstract (translated)

解决拼图游戏需要抓住一组碎片的视觉特征,并高效地探索随着序列长度呈指数增长的解决方案空间。因此,视觉深度强化学习(DRL)应该比结合神经网络的优化求解器更有效地解决这个问题。基于这一假设,我们介绍了Alphazzle,这是一个基于单人蒙特卡罗树搜索(MCTS)的重新组装算法。与DRL算法的主要区别在于MCTS游戏中的奖励不可用,我们展示了如何使用神经网络从视觉输入中估计它。这个限制是由解决拼图游戏任务引起的,它极大地增加了任务的复杂性(并增加了兴趣)。我们进行了深度去基化研究,表明MCTS和神经网络一起工作的重要性。我们取得了出色的结果,并获得了DRL和视觉特征学习的结合令人兴奋的洞察力。

URL

https://arxiv.org/abs/2302.00384

PDF

https://arxiv.org/pdf/2302.00384.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot