Sports video captioning by attentive motion representation based hierarchical recurrent neural networks
(Untertitelung von Sportvideos durch aufmerksame Bewegungsdarstellung auf der Grundlage hierarchisch rekurrierender neuronaler Netzwerke)
Sports video captioning is a task of automatically generating a textual description for sports events (e.g. football, basketball or volleyball games). Although previous works have shown promising performance in producing the coarse and general description of a video, it is still quite challenging to caption a sports video with multiple fine-grained player's actions and complex group relationship among players. In this paper, we present a novel hierarchical recurrent neural network (RNN) based framework with an attention mechanism for sports video captioning. A motion representation module is proposed to extract individual pose attribute and group-level trajectory cluster information. Moreover, we introduce a new dataset called Sports Video Captioning Dataset-Volleyball for evaluation. We evaluate our proposed model over two public datasets and our new dataset, and the experimental results demonstrate that our method outperforms the state-of-the-art methods.
© Copyright 2018 MMSports'18: Proceedings of the 1st International Workshop on Multimedia Content Analysis in Sports. Veröffentlicht von Association for Computing Machinery. Alle Rechte vorbehalten.
| Schlagworte: | |
|---|---|
| Notationen: | Naturwissenschaften und Technik Spielsportarten |
| Veröffentlicht in: | MMSports'18: Proceedings of the 1st International Workshop on Multimedia Content Analysis in Sports |
| Sprache: | Englisch |
| Veröffentlicht: |
New York
Association for Computing Machinery
2018
|
| Online-Zugang: | https://doi.org/10.1145/3265845.3265851 |
| Seiten: | 77-85 |
| Dokumentenarten: | Kongressband, Tagungsbericht |
| Level: | hoch |