Audio-visual classification of sports types

In this work we propose a method for classification of sports types from combined audio and visual features extracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality short trajectories are constructed to represent the motion of players. From these, four motion features are extracted and combined directly with audio features for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types. Using 10-fold cross validation a correct classification rate of 96.11% is obtained with multimodal features, compared to 86.67% and 90.00% using only visual or audio features, respectively.
© Copyright 2015 IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, Santiago. All rights reserved.

Bibliographic Details
Subjects:
Notations:technical and natural sciences
Tagging:markerless
Published in:IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, Santiago
Language:English
Published: 2015
Online Access:http://www.cv-foundation.org/openaccess/content_iccv_2015_workshops/w21/papers/Gade_Audio-Visual_Classification_of_ICCV_2015_paper.pdf
Pages:768-773
Document types:congress proceedings
Level:advanced