Trailers12k: Evaluating Transfer Learning for Movie Trailer Genre Classification

Abstract
Abstract (translated)
URL
PDF

Abstract

Transfer learning is a cornerstone for a wide range of computer vision this http URL has been broadly studied for image analysis tasks. However, literature for video analysis is scarce and has been mainly focused on transferring representations learned from ImageNet to human action recognition tasks. In this paper, we study transfer learning for Multi-label Movie Trailer Genre Classification (MTGC). In particular, we introduce Trailers12k}, a new manually-curated movie trailer dataset and evaluate the transferability of spatial and spatio-temporal representations learned from ImageNet and/or Kinetics to Trailers12k MTGC. In order to reduce the spatio-temporal structure gap between the source and target tasks and improve transferability, we propose a method that performs shot detection so as to segment the trailer into highly correlated clips. We study different aspects that influence transferability, such as segmentation strategy, frame rate, input video extension, and spatio-temporal modeling. Our results demonstrate that representations learned on either ImageNet or Kinetics are comparatively transferable to Trailers12k, although they provide complementary information that can be combined to improve classification performance. Having a similar number of parameters and FLOPS, Transformers provide a better transferability base than ConvNets. Nevertheless, competitive performance can be achieved using lightweight ConvNets, becoming an attractive option for low-resource environments.

Abstract (translated)

URL

https://arxiv.org/abs/2210.07983

PDF

https://arxiv.org/pdf/2210.07983.pdf