Abstract
Human-centric perceptions include a variety of vision tasks, which have widespread industrial applications, including surveillance, autonomous driving, and the metaverse. It is desirable to have a general pretrain model for versatile human-centric downstream tasks. This paper forges ahead along this path from the aspects of both benchmark and pretraining methods. Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting. To learn both coarse-grained and fine-grained knowledge in human bodies, we further propose a \textbf{P}rojector \textbf{A}ssis\textbf{T}ed \textbf{H}ierarchical pretraining method (\textbf{PATH}) to learn diverse knowledge at different granularity levels. Comprehensive evaluations on HumanBench show that our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets. The code will be publicly at \href{this https URL}{this https URL}.
Abstract (translated)
人中心化感知包括多种视觉任务,具有广泛的工业应用,包括监控、自动驾驶和虚拟现实等。我们希望有一个通用的人类中心化后续任务预训练模型。本文从基准和预训练方法两个方面提出了一个 extbf{人bench},以综合评估不同预训练方法对19个不同后续任务数据集的泛化能力。这些任务包括人重识别、姿态估计、人类解析、行人属性识别、行人检测和人群计数。为了学习人体中的粗调和细粒度知识,我们进一步提出了一个 extbf{P}rojector extbf{A}ssis extbf{T}ed extbf{H}ierarchical extbf{Pre}training extbf{M}athon ( extbf{PATH}),以学习不同粒度级别的多种知识。对人bench的综合评估表明,我们的 PATH 在17个后续任务数据集上取得了新的先进技术结果,而在另外2个数据集上取得了与平均水平相当的结果。代码将公开在 href{this https URL}{this https URL}。
URL
https://arxiv.org/abs/2303.05675