Wireless communications at high-frequency bands with large antenna arrays face challenges in beam management, which can potentially be improved by multimodality sensing information from cameras, LiDAR, radar, and GPS. In this paper, we present a multimodal transformer deep learning framework for sensing-assisted beam prediction. We employ a convolutional neural network to extract the features from a sequence of images, point clouds, and radar raw data sampled over time. At each convolutional layer, we use transformer encoders to learn the hidden relations between feature tokens from different modalities and time instances over abstraction space and produce encoded vectors for the next-level feature extraction. We train the model on a combination of different modalities with supervised learning. We try to enhance the model over imbalanced data by utilizing focal loss and exponential moving average. We also evaluate data processing and augmentation techniques such as image enhancement, segmentation, background filtering, multimodal data flipping, radar signal transformation, and GPS angle calibration. Experimental results show that our solution trained on image and GPS data produces the best distance-based accuracy of predicted beams at 78.44%, with effective generalization to unseen day scenarios near 73% and night scenarios over 84%. This outperforms using other modalities and arbitrary data processing techniques, which demonstrates the effectiveness of transformers with feature fusion in performing radio beam prediction from images and GPS. Furthermore, our solution could be pretrained from large sequences of multimodality wireless data, on fine-tuning for multiple downstream radio network tasks.
在高频band上具有大型天线阵列的无线通信面临着beam管理方面的挑战,这可以通过从相机、激光雷达、卫星定位系统和GPS收集的多模态感知信息来改善。在本文中,我们提出了一种多模态transformer深度学习框架,用于协助beam预测。我们使用卷积神经网络从时间序列图像、点云和雷达原始数据中抽取特征。在每个卷积层中,我们使用transformer编码器学习不同模态和时间实例之间的隐藏关系,并产生编码向量,用于更高级别的特征提取。我们使用监督学习结合不同模态进行训练。我们尝试通过利用焦点损失和指数平滑移动平均来增强不平衡数据。我们还评估了数据处理和增强技术,如图像增强、分割、背景滤波、多模态数据翻转、雷达信号转换和GPS角度校准。实验结果显示,我们训练在图像和GPS数据上的解决方案产生距离基于预测beam的最佳精度,为78.44%,并能够有效地泛化到未观察到的白天场景接近73%,夜晚场景超过84%。这比使用其他模态和任意数据处理技术的表现更好,这表明融合特征的transformer在从图像和GPS进行无线电 beam预测方面的有效性。此外,我们的解决方案可以从大量的多模态无线数据序列中预先训练,以微调多个后续无线电网络任务。
https://arxiv.org/abs/2309.11811
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios. In addition, these approaches often adopt a strategy of color correction followed by dust removal, which makes the algorithm structure too complex. To solve the issue, we introduce a novel image restoration model, named all-in-one sandstorm removal network (AOSR-Net). This model is developed based on a re-formulated sandstorm scattering model, which directly establishes the image mapping relationship by integrating intermediate parameters. Such integration scheme effectively addresses the problems of over-enhancement and weak generalization in the field of sand dust image enhancement. Experimental results on synthetic and real-world sandstorm images demonstrate the superiority of the proposed AOSR-Net over state-of-the-art (SOTA) algorithms.
大多数现有的沙尘暴增强方法都基于传统理论和先前知识,这通常限制它们在现实世界场景中的适用性。此外,这些方法通常采用颜色纠正后再进行沙尘去除的策略,这使得算法结构变得过于复杂。为了解决这一问题,我们提出了一种全新的图像恢复模型,名为“一体化沙尘暴去除网络”(AOSR-Net)。该模型基于重新构建的沙尘暴散射模型,通过整合中间参数直接建立了图像映射关系。这种整合方案有效地解决了沙尘暴图像增强领域中的过度增强和弱泛化问题。对合成和现实世界沙尘暴图像的实验结果证明了所提出的AOSR-Net相对于最新算法的优越性。
https://arxiv.org/abs/2309.08838
Recent image enhancement methods have shown the advantages of using a pair of long and short-exposure images for low-light photography. These image modalities offer complementary strengths and weaknesses. The former yields an image that is clean but blurry due to camera or object motion, whereas the latter is sharp but noisy due to low photon count. Motivated by the fact that modern smartphones come equipped with multiple rear-facing camera sensors, we propose a novel dual-camera method for obtaining a high-quality image. Our method uses a synchronized burst of short exposure images captured by one camera and a long exposure image simultaneously captured by another. Having a synchronized short exposure burst alongside the long exposure image enables us to (i) obtain better denoising by using a burst instead of a single image, (ii) recover motion from the burst and use it for motion-aware deblurring of the long exposure image, and (iii) fuse the two results to further enhance quality. Our method is able to achieve state-of-the-art results on synthetic dual-camera images from the GoPro dataset with five times fewer training parameters compared to the next best method. We also show that our method qualitatively outperforms competing approaches on real synchronized dual-camera captures.
最近的图像处理方法已经展示了使用长曝光和短曝光图像拍摄低光照片的优势。这两种图像模式具有互补的优点和缺点。前者呈现出清晰但模糊的照片,这是因为相机或物体运动造成的。而后者则由于光线计数低而变得模糊。由于现代智能手机通常配备多个后置摄像头传感器,因此我们提出了一种独特的双摄像头方法,以获得高质量的图像。我们的方法是使用一台相机同步拍摄一组短曝光图像和另一组长曝光图像。同时,我们还在拍摄一组短曝光图像和长曝光图像时使用同步的短曝光图像,这使我们能够(i)使用单个图像而不是一组图像进行更好的去噪,(ii)从每组图像中恢复运动并使用它进行运动检测的模糊去除,(iii)将两个结果融合以提高质量。我们的方法和GoPro数据集生成的合成双摄像头图像的最先进的结果相比,训练参数数量少了5倍。我们还表明,在我们方法的真实同步双摄像头捕捉中,它 qualitative地优于竞争方法。
https://arxiv.org/abs/2309.08826
Real-time transportation surveillance is an essential part of the intelligent transportation system (ITS). However, images captured under low-light conditions often suffer the poor visibility with types of degradation, such as noise interference and vague edge features, etc. With the development of imaging devices, the quality of the visual surveillance data is continually increasing, like 2K and 4K, which has more strict requirements on the efficiency of image processing. To satisfy the requirements on both enhancement quality and computational speed, this paper proposes a double domain guided real-time low-light image enhancement network (DDNet) for ultra-high-definition (UHD) transportation surveillance. Specifically, we design an encoder-decoder structure as the main architecture of the learning network. In particular, the enhancement processing is divided into two subtasks (i.e., color enhancement and gradient enhancement) via the proposed coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM), which are embedded in the encoder-decoder structure. It enables the network to enhance the color and edge features simultaneously. Through the decomposition and reconstruction on both color and gradient domains, our DDNet can restore the detailed feature information concealed by the darkness with better visual quality and efficiency. The evaluation experiments on standard and transportation-related datasets demonstrate that our DDNet provides superior enhancement quality and efficiency compared with the state-of-the-art methods. Besides, the object detection and scene segmentation experiments indicate the practical benefits for higher-level image analysis under low-light environments in ITS.
实时交通监控是智能交通系统(ITS)不可或缺的一部分。然而,在低光条件下拍摄的照片常常出现质量下降的情况,例如噪声干扰和模糊的边缘特征等。随着影像设备的不断发展,图像质量也在不断提高,例如2K和4K,对图像处理效率的要求更加严格。为了满足增强质量和计算速度的要求,本文提出了一种双重 domains 引导的实时低光图像增强网络(DDNet),用于 Ultra-High-Definition (UHD)交通监控。具体来说,我们设计了编码-解码结构作为学习网络的主要架构。特别是,增强处理通过 proposed 粗增强模块(CEM)和LoG-based 梯度增强模块(GEM)被分解为两个子任务(即颜色增强和梯度增强),这些模块嵌入在编码-解码结构中。这使网络可以同时增强颜色和边缘特征。通过在颜色和梯度两个域上进行分解和重建,我们的 DDNet 可以恢复被黑暗掩盖的详细特征信息,具有更好的视觉质量和效率。标准和交通相关数据集的评估实验表明,我们的 DDNet 与现有方法相比提供了更好的增强质量和效率。此外,物体检测和场景分割实验表明,在 ITS 中低光环境下的高级图像分析实践中具有实际 benefits。
https://arxiv.org/abs/2309.08382
The goal of low-light image enhancement is to restore the color and details of the image and is of great significance for high-level visual tasks in autonomous driving. However, it is difficult to restore the lost details in the dark area by relying only on the RGB domain. In this paper we introduce frequency as a new clue into the network and propose a novel DCT-driven enhancement transformer (DEFormer). First, we propose a learnable frequency branch (LFB) for frequency enhancement contains DCT processing and curvature-based frequency enhancement (CFE). CFE calculates the curvature of each channel to represent the detail richness of different frequency bands, then we divides the frequency features, which focuses on frequency bands with richer textures. In addition, we propose a cross domain fusion (CDF) for reducing the differences between the RGB domain and the frequency domain. We also adopt DEFormer as a preprocessing in dark detection, DEFormer effectively improves the performance of the detector, bringing 2.1% and 3.4% improvement in ExDark and DARK FACE datasets on mAP respectively.
弱光图像增强的目标是恢复图像的颜色和细节,这对于自动驾驶中的高级别视觉任务非常重要。然而,仅仅依靠RGB域无法在黑暗区域恢复丢失的细节。在本文中,我们将引入频率作为网络中的新线索,并提出了一种新的基于离散余弦变换的增强Transformer(DE former)。首先,我们提出了一个可学习的频率分支(LFB),其中包含DCT处理和基于曲率的频率增强(CFE)。CFE计算每个通道的曲率,以表示不同频率Band的细节丰富度,然后我们将频率特征分割为更富纹理的频率Band。此外,我们提出了一种跨域融合(CDF)方法,以减少RGB域和频率域之间的差异。我们还将在暗检测中采用DE former作为预处理,DE former有效地改进了探测器的性能,在ExDark和黑暗面部数据集上的mAP分别提高了2.1%和3.4%。
https://arxiv.org/abs/2309.06941
Underwater object detection faces the problem of underwater image degradation, which affects the performance of the detector. Underwater object detection methods based on noise reduction and image enhancement usually do not provide images preferred by the detector or require additional datasets. In this paper, we propose a plug-and-play Underwater joint image enhancement Module (UnitModule) that provides the input image preferred by the detector. We design an unsupervised learning loss for the joint training of UnitModule with the detector without additional datasets to improve the interaction between UnitModule and the detector. Furthermore, a color cast predictor with the assisting color cast loss and a data augmentation called Underwater Color Random Transfer (UCRT) are designed to improve the performance of UnitModule on underwater images with different color casts. Extensive experiments are conducted on DUO for different object detection models, where UnitModule achieves the highest performance improvement of 2.6 AP for YOLOv5-S and gains the improvement of 3.3 AP on the brand-new test set (URPCtest). And UnitModule significantly improves the performance of all object detection models we test, especially for models with a small number of parameters. In addition, UnitModule with a small number of parameters of 31K has little effect on the inference speed of the original object detection model. Our quantitative and visual analysis also demonstrates the effectiveness of UnitModule in enhancing the input image and improving the perception ability of the detector for object features.
水下物体检测面临图像退化的问题,这个问题会影响检测器的性能。基于噪声减少和图像增强的水下物体检测方法通常不会提供检测器喜欢的图像或需要额外的数据集。在本文中,我们提出了一种可插拔的水下联合图像增强模块(单元模块),可以提供检测器喜欢的输入图像。我们设计了一种无监督学习损失,以用于单元模块与检测器 joint 训练,以改善单元模块与检测器之间的交互。此外,我们设计了一种具有协助颜色 casts 损失和数据增强的称为 Underwater Color Random Transfer (UCRT)的颜色 cast预测器,以改善单元模块在水下图像中的颜色cast表现。在多个物体检测模型上进行了广泛的实验,其中单元模块在 YOLOv5-S 上实现了最高的性能改进 2.6 AP,并在全新的测试集(URPCtest)上获得了 3.3 AP 的提高。单元模块显著改善了我们测试的所有物体检测模型的性能,特别是对于参数数量较少的模型。此外,具有 31K 参数数量的小单元模块对原物体检测模型的推理速度几乎没有影响。我们的量化和视觉分析也证明了单元模块在增强输入图像和提高检测器对物体特征感知能力的有效性。
https://arxiv.org/abs/2309.04708
Underwater images suffer from complex and diverse degradation, which inevitably affects the performance of underwater visual tasks. However, most existing learning-based Underwater image enhancement (UIE) methods mainly restore such degradations in the spatial domain, and rarely pay attention to the fourier frequency information. In this paper, we develop a novel UIE framework based on spatial-frequency interaction and gradient maps, namely SFGNet, which consists of two stages. Specifically, in the first stage, we propose a dense spatial-frequency fusion network (DSFFNet), mainly including our designed dense fourier fusion block and dense spatial fusion block, achieving sufficient spatial-frequency interaction by cross connections between these two blocks. In the second stage, we propose a gradient-aware corrector (GAC) to further enhance perceptual details and geometric structures of images by gradient map. Experimental results on two real-world underwater image datasets show that our approach can successfully enhance underwater images, and achieves competitive performance in visual quality improvement.
水下图像遭受复杂的和多样化的退化,这不可避免地会影响水下视觉任务的表现。然而,目前大多数基于学习的海洋图像增强方法(UIE)主要修复在空间域中发生的这种退化,并且很少关注频域信息。在本文中,我们基于空间频率互动和梯度地图开发了一个全新的UIE框架,即SFGNet,它由两个阶段组成。具体来说,在第一个阶段,我们提出了密集空间频率融合网络(DSFFNet),主要由我们设计的密集傅里叶融合块和密集空间融合块组成,通过这两个块之间的交叉连接实现足够的空间频率交互。在第二个阶段,我们提出了一种梯度意识的纠正器(GAC),以通过梯度地图进一步增强图像的感知细节和几何结构。在两个真实的水下图像数据集上的实验结果表明,我们的方法可以成功地增强水下图像,并在视觉质量改善中表现出竞争力。
https://arxiv.org/abs/2309.04089
Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations, leading to challenges in reconciling semantic disparities between encoder and decoder features. To address this, we propose a novel feature enhancement segmentation network (FES-Net) that achieves accurate pixel-wise segmentation without requiring additional image enhancement steps. FES-Net directly processes the input image and utilizes four prompt convolutional blocks (PCBs) during downsampling, complemented by a shallow upsampling approach to generate a binary mask for each class. We evaluate the performance of FES-Net on four publicly available state-of-the-art datasets: DRIVE, STARE, CHASE, and HRF. The evaluation results clearly demonstrate the superior performance of FES-Net compared to other competitive approaches documented in the existing literature.
糖尿病视网膜病变和年龄相关的 macular 退化等疾病会对视力造成严重威胁,强调了精确分割 retinal 血管对于跟踪和诊断进展的重要性。然而,现有的基于编码-解码结构的特征分割方法往往难以捕捉 retinal 血管配置的上下文信息,导致在解词器和解码器特征之间的语义差异方面的挑战。为了解决这个问题,我们提出了一种新的特征增强分割网络(FES-Net),它能够在不需要额外的图像增强步骤的情况下实现准确的像素级分割。 FES-Net直接处理输入图像并利用四个快速卷积块(PCBs)在减小尺寸期间使用,还可以通过浅增量方法生成每个类别的二进制掩码。我们评估了 FES-Net 在四个公开可用的著名数据集上的性能:Drive、StarE、Chase 和 HRF。评估结果清楚地表明 FES-Net 相对于现有文献中其他竞争方法的优越性能。
https://arxiv.org/abs/2309.03535
In this paper, we present an approach to image enhancement with diffusion model in underwater scenes. Our method adapts conditional denoising diffusion probabilistic models to generate the corresponding enhanced images by using the underwater images and the Gaussian noise as the inputs. Additionally, in order to improve the efficiency of the reverse process in the diffusion model, we adopt two different ways. We firstly propose a lightweight transformer-based denoising network, which can effectively promote the time of network forward per iteration. On the other hand, we introduce a skip sampling strategy to reduce the number of iterations. Besides, based on the skip sampling strategy, we propose two different non-uniform sampling methods for the sequence of the time step, namely piecewise sampling and searching with the evolutionary algorithm. Both of them are effective and can further improve performance by using the same steps against the previous uniform sampling. In the end, we conduct a relative evaluation of the widely used underwater enhancement datasets between the recent state-of-the-art methods and the proposed approach. The experimental results prove that our approach can achieve both competitive performance and high efficiency. Our code is available at \href{mailto:this https URL}{\color{blue}{this https URL\_underwater}}.
在本文中,我们提出了一种利用扩散模型在水下场景下增强图像的方法。该方法适应条件去噪扩散概率模型,通过使用水下图像和Gaussian噪声作为输入来生成相应的增强图像。此外,为了提高扩散模型的逆过程效率,我们采用了两种方法。我们首先提出了一种轻量级Transformer-based去噪网络,该网络能够有效促进网络每迭代一次的时间forward。另一方面,我们引入了一种跳采样策略来减少迭代次数。此外,基于跳采样策略,我们提出了两个不同的非均匀采样方法,分别是piecewise采样和进化算法搜索。它们都非常有效,通过使用相同的步骤以以前均匀采样的方式进一步提高性能。最终,我们进行了最近先进的水下增强数据集与所提出方法之间的相对评估。实验结果表明,我们的方法可以实现具有竞争力性能和高效性的双重目标。我们的代码可在 \href{mailto:this https URL}{\color{blue}{this https URL\_underwater}} 找到。
https://arxiv.org/abs/2309.03445
Deep neural networks have achieved remarkable progress in enhancing low-light images by improving their brightness and eliminating noise. However, most existing methods construct end-to-end mapping networks heuristically, neglecting the intrinsic prior of image enhancement task and lacking transparency and interpretability. Although some unfolding solutions have been proposed to relieve these issues, they rely on proximal operator networks that deliver ambiguous and implicit priors. In this work, we propose a paradigm for low-light image enhancement that explores the potential of customized learnable priors to improve the transparency of the deep unfolding paradigm. Motivated by the powerful feature representation capability of Masked Autoencoder (MAE), we customize MAE-based illumination and noise priors and redevelop them from two perspectives: 1) \textbf{structure flow}: we train the MAE from a normal-light image to its illumination properties and then embed it into the proximal operator design of the unfolding architecture; and m2) \textbf{optimization flow}: we train MAE from a normal-light image to its gradient representation and then employ it as a regularization term to constrain noise in the model output. These designs improve the interpretability and representation capability of the model.Extensive experiments on multiple low-light image enhancement datasets demonstrate the superiority of our proposed paradigm over state-of-the-art methods. Code is available at this https URL.
深度学习在改善黑暗图像方面取得了显著的进展,通过提高亮度和消除噪声来实现。然而,大多数现有方法通过启发式的方法构建端到端的映射网络,忽视了图像增强任务的内在先验,并且缺乏透明度和解释性。虽然有一些展开解决方案试图解决这些问题,但它们依赖于远程操作网络,这些网络提供了含糊和隐含的先验。在本文中,我们提出了一种黑暗图像增强范式,旨在探索自定义可学习先验的潜力,以提高深度展开范式的透明度。基于Masked Autoencoder(MAE)的强大特征表示能力,我们定制了基于MAE的照明和噪声先验,并从两个角度重新开发它们:1)结构流:我们从正常光线图像到其照明属性的训练,并将其嵌入展开架构的远程操作设计中;2)优化流:我们从正常光线图像到其梯度表示的训练,并将其用作正则化项,限制模型输出中的噪声。这些设计改善了模型的解释性和表示能力。对多个黑暗图像增强数据集进行广泛的实验证明了我们提出的范式相对于最先进的方法的优越性。代码可在该httpsURL上获取。
https://arxiv.org/abs/2309.01958
Due to the uneven scattering and absorption of different light wavelengths in aquatic environments, underwater images suffer from low visibility and clear color deviations. With the advancement of autonomous underwater vehicles, extensive research has been conducted on learning-based underwater enhancement algorithms. These works can generate visually pleasing enhanced images and mitigate the adverse effects of degraded images on subsequent perception tasks. However, learning-based methods are susceptible to the inherent fragility of adversarial attacks, causing significant disruption in results. In this work, we introduce a collaborative adversarial resilience network, dubbed CARNet, for underwater image enhancement and subsequent detection tasks. Concretely, we first introduce an invertible network with strong perturbation-perceptual abilities to isolate attacks from underwater images, preventing interference with image enhancement and perceptual tasks. Furthermore, we propose a synchronized attack training strategy with both visual-driven and perception-driven attacks enabling the network to discern and remove various types of attacks. Additionally, we incorporate an attack pattern discriminator to heighten the robustness of the network against different attacks. Extensive experiments demonstrate that the proposed method outputs visually appealing enhancement images and perform averagely 6.71% higher detection mAP than state-of-the-art methods.
由于其在水生环境中不同光波长的不均匀散射和吸收,水下图像存在低能见度和彩色偏离问题。随着无人驾驶潜水器技术的发展,对基于学习的水下增强算法进行了广泛的研究。这些工作可以生成美观的增强图像,并减轻随后感知任务中受损图像的不良影响。然而,基于学习的方法易受对抗攻击的固有脆弱性影响,导致结果的严重干扰。在本文中,我们提出了一种协同抗干扰恢复网络,称为CARNet,用于水下图像增强和随后的检测任务。具体来说,我们首先引入了一种具有强大干扰感知能力的可逆网络,以隔离攻击与水下图像,防止干扰增强图像和感知任务。此外,我们提出了一种同步攻击训练策略,包括视觉驱动和感知驱动的攻击,使网络能够辨别和消除各种攻击。此外,我们引入了攻击模式鉴别器,以提高网络对各种攻击的鲁棒性。广泛的实验表明,所提出的方法输出美观的增强图像,平均地比先进方法高检测mAP6.71%。
https://arxiv.org/abs/2309.01102
Fundus photography is prone to suffer from image quality degradation that impacts clinical examination performed by ophthalmologists or intelligent systems. Though enhancement algorithms have been developed to promote fundus observation on degraded images, high data demands and limited applicability hinder their clinical deployment. To circumvent this bottleneck, a generic fundus image enhancement network (GFE-Net) is developed in this study to robustly correct unknown fundus images without supervised or extra data. Levering image frequency information, self-supervised representation learning is conducted to learn robust structure-aware representations from degraded images. Then with a seamless architecture that couples representation learning and image enhancement, GFE-Net can accurately correct fundus images and meanwhile preserve retinal structures. Comprehensive experiments are implemented to demonstrate the effectiveness and advantages of GFE-Net. Compared with state-of-the-art algorithms, GFE-Net achieves superior performance in data dependency, enhancement performance, deployment efficiency, and scale generalizability. Follow-up fundus image analysis is also facilitated by GFE-Net, whose modules are respectively verified to be effective for image enhancement.
眼睛照相容易受到图像质量的下降影响,影响医生或人工智能系统进行临床检查。虽然增强算法已经开发用于促进损坏图像下的眼表观察,但高数据需求和有限的适用性妨碍了它们的临床部署。为了绕过这一瓶颈,本研究开发了一款通用的眼表图像增强网络(GFE-Net),它可以 robustly 纠正未监督或额外的数据未知的眼表图像。利用图像频率信息,自监督表示学习进行,从损坏图像中学习具有结构aware表示的 robust 结构表示。然后,通过连接表示学习和图像增强的无缝架构,GFE-Net 可以准确地纠正眼表图像,同时保护视网膜结构。全面的实验旨在证明 GFE-Net 的效率和优势。与最先进的算法相比,GFE-Net 在数据依赖、增强性能、部署效率和规模通用性方面表现出卓越的性能。GFE-Net 的模块分别证明了其对图像增强的有效的。此外,GFE-Net 也协助进行后续眼表图像分析,其模块被验证对图像增强有效。
https://arxiv.org/abs/2309.00885
This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT). Despite its significance, underwater tracking has remained unexplored due to data inaccessibility. It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. Performance of traditional tracking methods designed primarily for terrestrial or open-air scenarios drops in such conditions. We address the problem by proposing a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers. To develop robust and accurate UVOT methods, large-scale datasets are required. To this end, we introduce a large-scale UVOT benchmark dataset consisting of 400 video segments and 275,000 manually annotated frames enabling underwater training and evaluation of deep trackers. The videos are labelled with several underwater-specific tracking attributes including watercolor variation, target distractors, camouflage, target relative size, and low visibility conditions. The UVOT400 dataset, tracking results, and the code are publicly available on: this https URL.
本文提出了一个新的数据集和针对水下可视化对象追踪(UVOT)的一般跟踪增强方法。尽管其重要性不容忽视,但由于数据难以获取,水下追踪仍然未被探索。这带来了独特的挑战;水下环境表现出不统一的照明条件、低能见度、缺乏锐利性、低对比度、伪装和悬浮粒子反射。传统用于陆地或公开场景的传统跟踪方法在这些条件下的性能下降。为了解决这一问题,我们提出了一个专门用于提高跟踪质量的 novel 水下图像增强算法。该方法取得了显著的性能改进,能够达到 5.0%AUC 以上的先进视觉跟踪器的水平。为了开发稳健准确的 UVOT 方法,需要大量的大规模数据集。为此,我们介绍了一个大规模的 UVOT 基准数据集,由 400 个视频片段和 275,000 个手动注释帧组成,以便进行水下深度跟踪器的深度学习训练和评估。视频被标记有多种水下特定的跟踪属性,包括水彩画变化、目标干扰、伪装、目标相对大小和低能见度条件。UVOT400 数据集、跟踪结果和代码已公开可在 this https URL 上可用。
https://arxiv.org/abs/2308.15816
Technologies of human action recognition in the dark are gaining more and more attention as huge demand in surveillance, motion control and human-computer interaction. However, because of limitation in image enhancement method and low-lighting video datasets, e.g. labeling cost, existing methods meet some problems. Some video-based approached are effect and efficient in specific datasets but cannot generalize to most cases while others methods using multiple sensors rely heavily to prior knowledge to deal with noisy nature from video stream. In this paper, we proposes action recognition method using deep multi-input network. Furthermore, we proposed a Independent Gamma Intensity Corretion (Ind-GIC) to enhance poor-illumination video, generating one gamma for one frame to increase enhancement performance. To prove our method is effective, there is some evaluation and comparison between our method and existing methods. Experimental results show that our model achieves high accuracy in on ARID dataset.
黑暗行动中的人动识别技术正在日益受到关注,因为这方面的需求正在不断增长,如监控、运动控制和人机交互。然而,由于图像增强方法和低光视频数据集的限制(例如标签成本),现有的方法遇到了一些问题。一些基于视频的方法在特定数据集上非常有效和高效,但无法适用于大多数情况。而其他方法使用多个传感器, heavily rely on prior knowledge来处理视频流中的噪声。在本文中,我们提出了使用深度多输入网络的人动识别方法。此外,我们提出了一种独立的伽马强度分布卷积(Ind-GIC),以增强低光照视频,生成一个伽马值用于每个帧以提高增强性能。为了证明我们的方法和现有方法的有效性,我们进行了一些评估和比较。实验结果表明,我们的模型在ARID数据集上取得了高精度。
https://arxiv.org/abs/2308.15345
Lightness adaptation is vital to the success of image processing to avoid unexpected visual deterioration, which covers multiple aspects, e.g., low-light image enhancement, image retouching, and inverse tone mapping. Existing methods typically work well on their trained lightness conditions but perform poorly in unknown ones due to their limited generalization ability. To address this limitation, we propose a novel generalized lightness adaptation algorithm that extends conventional normalization techniques through a channel filtering design, dubbed Channel Selective Normalization (CSNorm). The proposed CSNorm purposely normalizes the statistics of lightness-relevant channels and keeps other channels unchanged, so as to improve feature generalization and discrimination. To optimize CSNorm, we propose an alternating training strategy that effectively identifies lightness-relevant channels. The model equipped with our CSNorm only needs to be trained on one lightness condition and can be well generalized to unknown lightness conditions. Experimental results on multiple benchmark datasets demonstrate the effectiveness of CSNorm in enhancing the generalization ability for the existing lightness adaptation methods. Code is available at this https URL.
亮度适应是图像处理成功的重要因素,以避免意外的视觉恶化,涵盖了多个方面,例如低光图像增强、图像修复和逆色彩映射。现有的方法通常在其训练的亮度条件下表现良好,但在未知的亮度条件下表现较差,因为其泛化能力有限。为了解决这个问题,我们提出了一种广义的亮度适应算法,通过通道滤波设计扩展了传统的正则化技术,并将其称为通道选择正则化(CSNorm)。提出的CSNorm故意正则化与亮度相关的通道统计,而保持其他通道不变,以提高特征泛化和区分能力。为了优化CSNorm,我们提出了一种交替训练策略,有效地识别与亮度相关的通道。配备我们的CSNorm的模型只需要在一个亮度条件下进行训练,就可以很好地泛化到未知的亮度条件。多个基准数据集的实验结果证明了CSNorm在增强现有亮度适应方法的泛化能力方面的有效性。代码在此https URL可用。
https://arxiv.org/abs/2308.13783
In this paper, we rethink the low-light image enhancement task and propose a physically explainable and generative diffusion model for low-light image enhancement, termed as Diff-Retinex. We aim to integrate the advantages of the physical model and the generative network. Furthermore, we hope to supplement and even deduce the information missing in the low-light image through the generative network. Therefore, Diff-Retinex formulates the low-light image enhancement problem into Retinex decomposition and conditional image generation. In the Retinex decomposition, we integrate the superiority of attention in Transformer and meticulously design a Retinex Transformer decomposition network (TDN) to decompose the image into illumination and reflectance maps. Then, we design multi-path generative diffusion networks to reconstruct the normal-light Retinex probability distribution and solve the various degradations in these components respectively, including dark illumination, noise, color deviation, loss of scene contents, etc. Owing to generative diffusion model, Diff-Retinex puts the restoration of low-light subtle detail into practice. Extensive experiments conducted on real-world low-light datasets qualitatively and quantitatively demonstrate the effectiveness, superiority, and generalization of the proposed method.
在本文中,我们重新思考了暗光图像增强任务,并提出了一种可物理解释的生成扩散模型,称为Diff-Retinex。我们旨在整合物理模型和生成网络的优势。此外,我们希望通过生成网络补充甚至推断暗光图像中缺失的信息。因此,Diff-Retinex将暗光图像增强问题转化为Retinex分解和条件图像生成。在Retinex分解中,我们整合了Transformer的注意力优势,并仔细设计了一个Retinex Transformer分解网络(TDN),将图像分解为照明和反射映射。然后,我们设计了一系列路径生成扩散网络,以重构正常光照下的Retinex概率分布,并解决这些组件的各种退化,包括黑暗照明、噪声、色彩偏差、场景内容损失等。由于生成扩散模型,Diff-Retinex将暗光细节恢复实际应用。在真实的暗光数据集上进行了大量实验,定性和定量地证明了该方法的有效性、优越性和泛化性。
https://arxiv.org/abs/2308.13164
Low-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions.
暗光图像的特点是光线不足,存在减少清晰度、抑制颜色和减少细节的问题。暗光图像增强是计算机视觉中的必须任务,旨在通过改善亮度、对比度和整体感知质量来解决这些问题,从而方便准确的分析和解释。本文介绍了卷积密集注意力引导网络(CDAN),它是一种增强暗光图像的新方法。CDAN将自动编码架构与卷积和密集块相结合,并添加了注意力机制和跳过连接。这种架构确保了高效的信息传播和特征学习。此外,一个专门的后处理阶段优化了颜色平衡和对比度。我们的方法相比暗光图像增强最先进的结果表现出显著的进展,展示了它在各种挑战性场景中的稳健性。我们的模型在基准数据集上表现出色,有效地减少了过曝,在不同暗光场景中恢复了不同纹理和颜色。这项成就强调了CDAN对多种计算机视觉任务的潜在潜力,特别是能够在挑战性的暗光条件下实现稳健的对象检测和识别。
https://arxiv.org/abs/2308.12902
Visual restoration of underwater scenes is crucial for visual tasks, and avoiding interference from underwater media has become a prominent concern. In this work, we present a synergistic multiscale detail refinement via intrinsic supervision (SMDR-IS) to recover underwater scene details. The low-degradation stage provides multiscale detail for original stage, which achieves synergistic multiscale detail refinement through feature propagation via the adaptive selective intrinsic supervised feature module (ASISF), which achieves synergistic multiscale detail refinement. ASISF is developed using intrinsic supervision to precisely control and guide feature transmission in the multi-degradation stages. ASISF improves the multiscale detail refinement while reducing interference from irrelevant scene information from the low-degradation stage. Additionally, within the multi-degradation encoder-decoder of SMDR-IS, we introduce a bifocal intrinsic-context attention module (BICA). This module is designed to effectively leverage multi-scale scene information found in images, using intrinsic supervision principles as its foundation. BICA facilitates the guidance of higher-resolution spaces by leveraging lower-resolution spaces, considering the significant dependency of underwater image restoration on spatial contextual relationships. During the training process, the network gains advantages from the integration of a multi-degradation loss function. This function serves as a constraint, enabling the network to effectively exploit information across various scales. When compared with state-of-the-art methods, SMDR-IS demonstrates its outstanding performance. Code will be made publicly available.
水下场景的可视化恢复对于视觉任务至关重要,避免水下媒体的干扰已经成为一个重要的关注点。在这项工作中,我们提出了一种协同的多维度细节重构方法,通过内在监督(SMDR-IS)来恢复水下场景的细节。低退化阶段提供了原始阶段的多维度细节,通过特征传播采用自适应选择的内在监督特征模块(ASISF),实现了协同的多维度细节重构,从而实现协同的多维度细节重构。ASISF使用内在监督来精确控制和指导特征传输,在多个退化阶段改善多维度细节重构,同时减少低退化阶段相关的场景信息干扰。此外,在SMDR-IS的多个退化编码器/解码器中,我们引入了双焦点内在上下文注意力模块(BICA)。该模块旨在有效地利用图像中发现的多维度场景信息,利用内在监督原理作为其基础。BICA通过利用低分辨率空间来利用高分辨率空间,考虑到水下图像恢复对空间上下文关系的重要依赖。在训练过程中,网络从多个退化损失函数的集成中获得了优势。这一功能作为限制,使网络能够有效地利用不同尺度的信息。与最先进的方法相比,SMDR-IS表现出卓越的性能。代码将公开可用。
https://arxiv.org/abs/2308.11932
Existing unsupervised low-light image enhancement methods lack enough effectiveness and generalization in practical applications. We suppose this is because of the absence of explicit supervision and the inherent gap between real-world scenarios and the training data domain. In this paper, we develop Diffusion-based domain calibration to realize more robust and effective unsupervised Low-Light Enhancement, called DiffLLE. Since the diffusion model performs impressive denoising capability and has been trained on massive clean images, we adopt it to bridge the gap between the real low-light domain and training degradation domain, while providing efficient priors of real-world content for unsupervised models. Specifically, we adopt a naive unsupervised enhancement algorithm to realize preliminary restoration and design two zero-shot plug-and-play modules based on diffusion model to improve generalization and effectiveness. The Diffusion-guided Degradation Calibration (DDC) module narrows the gap between real-world and training low-light degradation through diffusion-based domain calibration and a lightness enhancement curve, which makes the enhancement model perform robustly even in sophisticated wild degradation. Due to the limited enhancement effect of the unsupervised model, we further develop the Fine-grained Target domain Distillation (FTD) module to find a more visual-friendly solution space. It exploits the priors of the pre-trained diffusion model to generate pseudo-references, which shrinks the preliminary restored results from a coarse normal-light domain to a finer high-quality clean field, addressing the lack of strong explicit supervision for unsupervised methods. Benefiting from these, our approach even outperforms some supervised methods by using only a simple unsupervised baseline. Extensive experiments demonstrate the superior effectiveness of the proposed DiffLLE.
目前存在的无监督低光图像增强方法在实际应用中缺乏足够的效率和泛化能力。我们认为这可能是因为缺乏明确的监督和实际场景与训练数据域之间的固有的差距。在本文中,我们开发基于扩散域的域校准来实现更稳健和有效的无监督低光增强,并将其称为DiffLLE。由于扩散模型能够出色的去噪能力,已经对大量干净图像进行了训练,因此我们采用它来填补实际低光域和训练失真域之间的差距,同时为无模型提供高效的实际内容先验。具体来说,我们采用一种简单的无监督增强算法来实现初步恢复,并基于扩散模型设计两个零次操作插件模块,以提高泛化和效率。扩散引导失真校准(DDC)模块通过扩散域校准和亮度增强曲线来缩小实际场景和训练低光恶化之间的差距,这使得增强模型即使在复杂的失控恶化情况下也能表现稳健。由于无模型增强效果有限,我们进一步开发了精细目标域蒸馏(FTD)模块,以找到更具视觉友好性的解决方案空间。它利用预训练的扩散模型的先验来生成伪参考,从而缩小初步恢复结果从粗颗粒正常光域到更细的高质量干净域的范围,解决了无监督方法缺乏强烈的明确监督的问题。受益于这些,我们的方法甚至通过仅使用简单的无监督基准来表现出比一些监督方法更好的效果。广泛的实验证明了所提出的DiffLLE的优越性能。
https://arxiv.org/abs/2308.09279
This paper presents a novel network structure with illumination-aware gamma correction and complete image modelling to solve the low-light image enhancement problem. Low-light environments usually lead to less informative large-scale dark areas, directly learning deep representations from low-light images is insensitive to recovering normal illumination. We propose to integrate the effectiveness of gamma correction with the strong modelling capacities of deep networks, which enables the correction factor gamma to be learned in a coarse to elaborate manner via adaptively perceiving the deviated illumination. Because exponential operation introduces high computational complexity, we propose to use Taylor Series to approximate gamma correction, accelerating the training and inference speed. Dark areas usually occupy large scales in low-light images, common local modelling structures, e.g., CNN, SwinIR, are thus insufficient to recover accurate illumination across whole low-light images. We propose a novel Transformer block to completely simulate the dependencies of all pixels across images via a local-to-global hierarchical attention mechanism, so that dark areas could be inferred by borrowing the information from far informative regions in a highly effective manner. Extensive experiments on several benchmark datasets demonstrate that our approach outperforms state-of-the-art methods.
本论文提出了一种具有照明感知gamma校正和完整图像建模的新型网络结构,以解决低光图像增强问题。低光环境通常会导致大型信息不足的黑暗区域,直接从低光图像学习深度表示则无法恢复正常照明。我们建议将gamma校正的效果与深度学习模型的强大建模能力相结合,通过自适应地感知偏差照明来学习调整因子gamma,以粗粒度地 elaborate manner 学习,以加速训练和推断速度。由于指数操作引入了高计算复杂性,我们建议使用泰勒级数近似gamma校正,加速训练和推断速度。黑暗区域通常在低光图像中占据大型范围,常见的局部建模结构,如卷积神经网络(CNN)和SwinIR,不足以在整个低光图像中恢复准确的照明。我们建议采用一种新的Transformer块,通过 local-to-global Hierarchical Attention 机制,完全模拟图像中所有像素之间的依赖关系,以便通过从遥远的 informative 区域借取信息以高效地推断黑暗区域。在多个基准数据集上进行广泛的实验表明,我们的方法和先进方法相比表现更好。
https://arxiv.org/abs/2308.08220