Paper Reading AI Learner

TenAd: A Tensor-based Low-rank Black Box Adversarial Attack for Video Classification

2025-04-01 22:35:28
Kimia haghjooei, Mansoor Rezghi

Abstract

Deep learning models have achieved remarkable success in computer vision but remain vulnerable to adversarial attacks, particularly in black-box settings where model details are unknown. Existing adversarial attack methods(even those works with key frames) often treat video data as simple vectors, ignoring their inherent multi-dimensional structure, and require a large number of queries, making them inefficient and detectable. In this paper, we propose \textbf{TenAd}, a novel tensor-based low-rank adversarial attack that leverages the multi-dimensional properties of video data by representing videos as fourth-order tensors. By exploiting low-rank attack, our method significantly reduces the search space and the number of queries needed to generate adversarial examples in black-box settings. Experimental results on standard video classification datasets demonstrate that \textbf{TenAd} effectively generates imperceptible adversarial perturbations while achieving higher attack success rates and query efficiency compared to state-of-the-art methods. Our approach outperforms existing black-box adversarial attacks in terms of success rate, query efficiency, and perturbation imperceptibility, highlighting the potential of tensor-based methods for adversarial attacks on video models.

Abstract (translated)

深度学习模型在计算机视觉领域取得了显著的成功,但仍然容易受到对抗性攻击的影响,尤其是在黑盒设置中,即当模型的细节未知时。现有的对抗性攻击方法(即使那些针对关键帧的方法)通常将视频数据视为简单的向量,忽略了它们固有的多维结构,并且需要大量的查询,这使得这些方法既低效又易于检测。在本文中,我们提出了一种名为**TenAd**的新颖张量基低秩对抗性攻击方法,该方法通过将视频表示为四阶张量来利用视频数据的多维特性。通过利用低秩攻击,我们的方法显著减少了生成黑盒设置下对抗样本所需的搜索空间和查询次数。在标准视频分类数据集上的实验结果表明,**TenAd**能够有效生成不可察觉的对抗性扰动,并且与现有最先进的方法相比,在攻击成功率和查询效率方面表现出更高的性能。我们的方法在成功概率、查询效率以及扰动生成的不可感知性方面优于现有的黑盒对抗性攻击方法,这突显了基于张量的方法在视频模型对抗攻击中的潜力。

URL

https://arxiv.org/abs/2504.01228

PDF

https://arxiv.org/pdf/2504.01228.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot