Paper Reading AI Learner

Taming Lookup Tables for Efficient Image Retouching

2024-03-28 08:49:35
Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

Abstract

The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at this https URL.

Abstract (translated)

高清显示屏在边缘设备(如用户摄像头、智能手机和电视)中的广泛使用,推动了图像增强需求的显著增长。现有的增强模型通常在提高性能的同时,却忽略了降低硬件推理时间和功耗,尤其是在受约束的计算和存储资源边缘设备上。为此,我们提出了 Image Color Enhancement Lookup Table (ICELUT),它采用无卷积神经网络(CNN)的边缘推理,实现了非常高效的图像增强。在训练过程中,我们利用点乘积(1x1)卷积提取颜色信息,并添加一个全连接层以整合全局信息。然后,这两个组件都被无缝转换为硬件无关的LUT。ICELUT 取得了与最先进的性能相当的成绩,并且具有非常低的功耗。我们观察到,点乘积网络结构表现出出色的可扩展性,即使以 heavily downsampled 32x32 的输入图像,其性能仍然保持不变。这使得 ICELUT,首个基于LUT 的图像增强器,能够在 GPU 上达到 0.4ms 的速度,在 CPU 上达到 7ms 的速度,至少比任何 CNN 解决方案快一个数量级。代码可在此处下载:https://www.icelut.org/。

URL

https://arxiv.org/abs/2403.19238

PDF

https://arxiv.org/pdf/2403.19238.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot