Video interaction recognition using an attention augmented relational network and skeleton data

(Erkennung von Videointeraktionen mit Hilfe eines aufmerksamkeitserweiterten relationalen Netzwerks und Skelettdaten)

Recognizing interactions in multi-person videos known as Video Interaction Recognition (VIR) is crucial for understanding video content. Often the human skeleton pose (skeleton for short) is a popular feature for VIR as the main feature given its success for the task in hand. While many studies have made progress using complex architectures like Graph Neural Networks (GNN) and Transformers to capture interactions in videos studies such as [33] that apply simple easy to train and adaptive architectures such as Relation reasoning Network (RN) [37] yield competitive results. Inspired by this trend we propose the Attention Augmented Relational Network (AARN) a straightforward yet effective model that uses skeleton data to recognize interactions in videos. AARN outperforms other RN-based models and remains competitive against larger more intricate models. We evaluate our approach on a challenging real-world Hockey Penalty Dataset (HPD) where the videos depict complex interactions between players in a non-laboratory recording setup in addition to popular benchmark datasets demonstrating strong performance. Lastly we show the impact of skeleton quality on the classification accuracy and the struggle of off-the-shelf pose estimators to extract precise skeleton from the challenging HPD dataset. Erkennung von Videointeraktionen mit Hilfe eines aufmerksamkeitserweiterten relationalen Netzwerks und Skelettdaten
© Copyright 2024 Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. Veröffentlicht von IEEE. Alle Rechte vorbehalten.

Bibliographische Detailangaben
Schlagworte:
Notationen:Naturwissenschaften und Technik
Tagging:Netzwerk
Veröffentlicht in:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops
Sprache:Englisch
Veröffentlicht: Piscataway, NJ IEEE 2024
Online-Zugang:https://openaccess.thecvf.com/content/CVPR2024W/CVsports/html/Askari_Video_Interaction_Recognition_using_an_Attention_Augmented_Relational_Network_and_CVPRW_2024_paper.html
Seiten:3225-3234
Dokumentenarten:Artikel
Level:hoch