Paper Reading AI Learner

Frequency Domain Convolutional Neural Network: Accelerated CNN for Large Diabetic Retinopathy Image Classification

2021-06-24 02:52:54
Ee Fey Goh, ZhiYuan Chen, Wei Xiang Lim

Abstract

The conventional spatial convolution layers in the Convolutional Neural Networks (CNNs) are computationally expensive at the point where the training time could take days unless the number of layers, the number of training images or the size of the training images are reduced. The image size of 256x256 pixels is commonly used for most of the applications of CNN, but this image size is too small for applications like Diabetic Retinopathy (DR) classification where the image details are important for accurate classification. This research proposed Frequency Domain Convolution (FDC) and Frequency Domain Pooling (FDP) layers which were built with RFFT, kernel initialization strategy, convolution artifact removal and Channel Independent Convolution (CIC) to replace the conventional convolution and pooling layers. The FDC and FDP layers are used to build a Frequency Domain Convolutional Neural Network (FDCNN) to accelerate the training of large images for DR classification. The Full FDC layer is an extension of the FDC layer to allow direct use in conventional CNNs, it is also used to modify the VGG16 architecture. FDCNN is shown to be at least 54.21% faster and 70.74% more memory efficient compared to an equivalent CNN architecture. The modified VGG16 architecture with Full FDC layer is reported to achieve a shorter training time and a higher accuracy at 95.63% compared to the original VGG16 architecture for DR classification.

Abstract (translated)

URL

https://arxiv.org/abs/2106.12736

PDF

https://arxiv.org/pdf/2106.12736.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot