4081955

MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation

Human pose estimation is a fundamental yet challenging task in computer vision. Recently, with the involvement of deep neural networks, human pose estimation has made great progresses. However, existing pose estimation networks still have some difficulties in detecting small-scale keypoints and distinguishing semantic confusion keypoints. In this paper, a novel convolutional neural network named multi-scale position enhancement network is proposed to address the above two problems. First, a multi-scale adaptive fusion unit is proposed to dynamically choose and fuse features on different scales, allowing small-scale keypoints to obtain more detailed information that is beneficial for detection. Second, we discover that although appearance-similar parts are difficult to distinguish in semantics, they differ significantly in spatial location. Therefore, a position enhancement module is designed to highlight features of real joint locations while learning more discriminative features to suppress features of similar joint regions. Finally, a global context block is applied to optimize the prediction results in order to further improve the network performance. Experiments on both single- and multi-person pose estimation benchmarks illustrate that our approach yields more accurate and reliable results.
© Copyright 2023 The Visual Computer. Springer. All rights reserved.

Bibliographic Details
Subjects:
Notations:technical and natural sciences
Tagging:Algorithmus neuronale Netze
Published in:The Visual Computer
Language:English
Published: 2023
Online Access:https://doi.org/10.1007/s00371-022-02460-y
Volume:39
Issue:5
Pages:2005-2019
Document types:article
Level:advanced