4094290

From beats to scores: a multi-modal framework for comprehensive figure skating assessment

(Von Takt zu Wertung: ein multimodales Framework zur umfassenden Bewertung im Eiskunstlauf)

Accurate quantitative evaluation of the Technical Elements Score (TES) and Program Components Score (PCS) in figure skating requires exceptional skill and professionalism, making it a highly challenging task. It requires not only meticulous observation of athletes' technical movements but also the ability to appreciate and assess artistic elements according to the current scoring criteria. The multi-modal large language model (MLLM) is rapidly advancing in its ability to understand various forms of information, such as images, videos, and audio. Previous research efforts have mainly utilized audio and video in isolation, but effective evaluation in figure skating requires a unified approach. In our work, we developed a multi-modal method that first identifies sub-movements through audio-guided videos by leveraging the phenomena of "hit the beat" and integrates audio and video features specifically tailored for figure skating. This integration enhances the coordination between these modalities. Building upon this foundation, we propose a comprehensive assessment model that utilizes multi-modal series representation learning to derive TES and PCS scores and generate text-based competition evaluations based on video, audio, and contextual prompts. Extensive experiments have proven that our proposed method has state-of-the-art scoring ability and generalization performance. Our code is available at https://github.com/ycwfs/Figure-Skating-Quality-Assessment.
© Copyright 2025 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. Veröffentlicht von IEEE. Alle Rechte vorbehalten.

Bibliographische Detailangaben
Schlagworte:
Notationen:Naturwissenschaften und Technik technische Sportarten
Tagging:künstliche Intelligenz
Veröffentlicht in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Sprache:Englisch
Veröffentlicht: Piscataway, NJ IEEE 2025
Online-Zugang:https://openaccess.thecvf.com/content/CVPR2025W/CVSPORTS/html/Wang_From_Beats_to_Scores_A_Multi-Modal_Framework_for_Comprehensive_Figure_CVPRW_2025_paper.html
Seiten:5904-5913
Dokumentenarten:Artikel
Level:hoch