Paper Reading AI Learner

PDFNet: Pointwise Dense Flow Network for Urban-Scene Segmentation

2021-09-21 10:39:46
Venkata Satya Sai Ajay Daliparthi

Abstract

In recent years, using a deep convolutional neural network (CNN) as a feature encoder (or backbone) is the most commonly observed architectural pattern in several computer vision methods, and semantic segmentation is no exception. The two major drawbacks of this architectural pattern are: (i) the networks often fail to capture small classes such as wall, fence, pole, traffic light, traffic sign, and bicycle, which are crucial for autonomous vehicles to make accurate decisions. (ii) due to the arbitrarily increasing depth, the networks require massive labeled data and additional regularization techniques to converge and to prevent the risk of over-fitting, respectively. While regularization techniques come at minimal cost, the collection of labeled data is an expensive and laborious process. In this work, we address these two drawbacks by proposing a novel lightweight architecture named point-wise dense flow network (PDFNet). In PDFNet, we employ dense, residual, and multiple shortcut connections to allow a smooth gradient flow to all parts of the network. The extensive experiments on Cityscapes and CamVid benchmarks demonstrate that our method significantly outperforms baselines in capturing small classes and in few-data regimes. Moreover, our method achieves considerable performance in classifying out-of-the training distribution samples, evaluated on Cityscapes to KITTI dataset.

Abstract (translated)

URL

https://arxiv.org/abs/2109.10083

PDF

https://arxiv.org/pdf/2109.10083.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot