CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning

2022-11-15 15:48:28

Jinghuai Zhang, Hongbin Liu, Jinyuan Jia, Neil Zhenqiang Gong

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

Contrastive learning (CL) pre-trains general-purpose encoders using an unlabeled pre-training dataset, which consists of images (called single-modal CL) or image-text pairs (called multi-modal CL). CL is vulnerable to data poisoning based backdoor attacks (DPBAs), in which an attacker injects poisoned inputs into the pre-training dataset so the encoder is backdoored. However, existing DPBAs achieve limited effectiveness. In this work, we propose new DPBAs called CorruptEncoder to CL. Our experiments show that CorruptEncoder substantially outperforms existing DPBAs for both single-modal and multi-modal CL. CorruptEncoder is the first DPBA that achieves more than 90% attack success rates on single-modal CL with only a few (3) reference images and a small poisoning ratio (0.5%). Moreover, we also propose a defense, called localized cropping, to defend single-modal CL against DPBAs. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the needs of new defenses.

Abstract (translated)

URL

https://arxiv.org/abs/2211.08229

PDF

https://arxiv.org/pdf/2211.08229.pdf