Low-cost optical tracking of soccer players

(Kostengünstige optische Verfolgung von Fußballspielern)

Sports analytics are on the rise in European football, however, due to the high cost so far only the top tier leagues and championships have had the privilege of collecting high precision data to build upon. We believe that this opportunity should be available for everyone especially for youth teams, to develop and recognize talent earlier. We therefore set the goal of creating a low-cost player tracking system that could be applied in a wide base of football clubs and pitches, which in turn would widen the reach for sports analytics, ultimately assisting the work of scouts and coaches in general. In this paper, we present a low-cost optical tracking solution based on cheap action cameras and cloud-deployed data processing. As we build on existing research results in terms of methods for player detection, i.e., background-foreground separation, and for tracking, i.e., Kalman filter, we adapt those algorithms with the aim of sacrificing as least as possible on accuracy while keeping costs low. The results are promising: our system yields significantly better accuracy than a standard deep learning based tracking model at the fraction of its cost. In fact, at a cost of $2.4 per match spent on cloud processing of videos for real-time results, all players can be tracked with a 11-meter precision on average.Motivated by the fact that some shots are better than others, the expected goals (xG) metric attempts to quantify the quality of goal-scoring opportunities in soccer. The metric is becoming increasingly popular, making its way to TV analysts` desks. Yet, a vastly underexplored topic in the context of xG is how these models are affected by the data on which they are trained. In this paper, we explore several data-related questions that may affect the performance of an xG model. We showed that the amount of data needed to train an accurate xG model depends on the complexity of the learner and the number of features, with up to 5 seasons of data needed to train a complex gradient boosted trees model. Despite the style of play changing over time and varying between leagues, we did not find that using only recent data or league-specific models improves the accuracy significantly. Hence, if limited data is available, training models on less recent data or different leagues is a viable solution. Mixing data from multiple data sources should be avoided.
© Copyright 2020 Machine Learning and Data Mining for Sports Analytics. KU Leuven. Veröffentlicht von Springer. Alle Rechte vorbehalten.

Bibliographische Detailangaben
Schlagworte:
Notationen:Spielsportarten
Tagging:data mining
Veröffentlicht in:Machine Learning and Data Mining for Sports Analytics
Sprache:Englisch
Veröffentlicht: Cham Springer 2020
Online-Zugang:http://doi.org/10.1007/978-3-030-64912-8_3
Jahrgang:1324
Seiten:28-39
Dokumentenarten:Artikel
Level:hoch