Paper Reading AI Learner

Depth Insight -- Contribution of Different Features to Indoor Single-image Depth Estimation

2023-11-16 17:38:21
Yihong Wu, Yuwen Heng, Mahesan Niranjan, Hansung Kim

Abstract

Depth estimation from a single image is a challenging problem in computer vision because binocular disparity or motion information is absent. Whereas impressive performances have been reported in this area recently using end-to-end trained deep neural architectures, as to what cues in the images that are being exploited by these black box systems is hard to know. To this end, in this work, we quantify the relative contributions of the known cues of depth in a monocular depth estimation setting using an indoor scene data set. Our work uses feature extraction techniques to relate the single features of shape, texture, colour and saturation, taken in isolation, to predict depth. We find that the shape of objects extracted by edge detection substantially contributes more than others in the indoor setting considered, while the other features also have contributions in varying degrees. These insights will help optimise depth estimation models, boosting their accuracy and robustness. They promise to broaden the practical applications of vision-based depth estimation. The project code is attached to the supplementary material and will be published on GitHub.

Abstract (translated)

从单个图像中进行深度估计是一个在计算机视觉领域具有挑战性的问题,因为缺乏双目差异或运动信息。相比之下,最近使用端到端训练的深度神经架构在這個領域取得了令人印象深刻的表現,但是這些黑盒系統利用的圖像中的提示是很难知道的。因此,在本文中,我們使用一組室內場景數據集來 quantify在單目深度估计設置中已知深度提示的相對貢獻。我們的 work 使用特徵提取技術將單一特徵的形狀、紋理、色彩和飽和度之間的關係與預測深度建立聯繫。我們發現,從邊緣檢測中提取的對象的形狀對室內環境中的其他特徵來說有更大的貢獻,而其他特徵在其他程度上也有貢獻。這些見解將有助於優化深度估計模型,提高其準確性和稳健性。它們有望拓宽基於視覺深度的應用範圍。項目代碼已附於補充材料中,並將发表在GitHub上。

URL

https://arxiv.org/abs/2311.10042

PDF

https://arxiv.org/pdf/2311.10042.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot