Paper Reading AI Learner

Revisiting Mid-Level Patterns for Distant-Domain Few-Shot Recognition

2020-08-07 12:45:39
Yixiong Zou, Shanghang Zhang, José M. F. Moura, JianPeng Yu, Yonghong Tian

Abstract

Cross-domain few-shot learning (FSL) is proposed recently to transfer knowledge from general-domain known classes (e.g., ImageNet) to novel classes in other domains, and recognize novel classes with only few training samples. In this paper, we go further to define a more challenging scenario that transfers knowledge from general-domain known classes to novel classes in distant domains which are significantly different from the general domain, e.g., medical data. To solve this challenging problem, we propose to exploit mid-level features, which are more transferable, yet under-explored in recent main-stream FSL works. To boost the discriminability of mid-level features, we propose a residual-prediction task for the training on known classes. In this task, we view the current training sample as a sample from a pseudo-novel class, so as to provide simulated novel-class data. However, this simulated data is from the same domain as known classes, and shares high-level patterns with other known classes. Therefore, we then use high-level patterns from other known classes to represent it and remove this high-level representation from the simulated data, outputting a residual term containing discriminative information of it that could not be represented by high-level patterns from other known classes. Then, mid-level features from multiple mid-layers are dynamically weighted to predict this residual term, which encourages the mid-level features to be discriminative. Notably, our method can be applied to both the regular in-domain FSL setting by emphasizing high-level & transformed mid-level features and the distant-domain FSL setting by emphasizing mid-level features. Experiments under both settings on six public datasets (including two challenging medical datasets) validate the rationale of the proposed method, demonstrating state-of-the-art performance on both settings.

Abstract (translated)

URL

https://arxiv.org/abs/2008.03128

PDF

https://arxiv.org/pdf/2008.03128.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot