Abstract
This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.
Abstract (translated)
URL
https://arxiv.org/abs/1906.07016