Paper Reading AI Learner

Calisthenics Skills Temporal Video Segmentation

2025-07-16 13:55:27
Antonio Finocchiaro, Giovanni Maria Farinella, Antonino Furnari

Abstract

Calisthenics is a fast-growing bodyweight discipline that consists of different categories, one of which is focused on skills. Skills in calisthenics encompass both static and dynamic elements performed by athletes. The evaluation of static skills is based on their difficulty level and the duration of the hold. Automated tools able to recognize isometric skills from a video by segmenting them to estimate their duration would be desirable to assist athletes in their training and judges during competitions. Although the video understanding literature on action recognition through body pose analysis is rich, no previous work has specifically addressed the problem of calisthenics skill temporal video segmentation. This study aims to provide an initial step towards the implementation of automated tools within the field of Calisthenics. To advance knowledge in this context, we propose a dataset of video footage of static calisthenics skills performed by athletes. Each video is annotated with a temporal segmentation which determines the extent of each skill. We hence report the results of a baseline approach to address the problem of skill temporal segmentation on the proposed dataset. The results highlight the feasibility of the proposed problem, while there is still room for improvement.

Abstract (translated)

体操健身(Calisthenics)是一种快速增长的自重训练学科,它包含不同的类别,其中一个重点是技能。在体操健身中,技能包括运动员执行的各种静态和动态元素。静态技能的评估基于它们的难度等级以及保持的时间长度。能够通过分析视频中的身体姿态来识别等长(静止)技能并估算其持续时间的自动化工具对运动员训练和裁判评判比赛时非常有用。 尽管关于动作识别的身体姿态分析文献丰富,但此前没有研究专门解决体操健身技能在视频中进行时间分割的问题。本研究旨在为实现体操健身领域的自动化工具提供初步步骤。为了在这个领域推进知识,我们提出了一套由运动员表演的静态体操健身技巧的视频数据集。每个视频都标注了时间分割信息,以确定每项技能的具体范围。因此,我们报告了一个基本方法的结果,用于解决在提议的数据集中进行技能时间分割的问题。 结果表明,所提问题的实现是可行的,但仍有改进的空间。

URL

https://arxiv.org/abs/2507.12245

PDF

https://arxiv.org/pdf/2507.12245.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot