Human pose estimation based on graph neural network: survey
Ramesh Kumar Lama, SeongKi Kim
IF 6.1
Journal of King Saud University - Computer and Information Sciences
Abstract Human pose estimation is a fundamental task in computer vision with widespread applications in human–computer interaction, sports analytics, and healthcare. While convolutional neural networks (CNNs) and Transformers have achieved notable success, they often struggle to capture structured body relationships, handle occlusions, and generalize effectively across diverse environments. Graph Neural Networks (GNNs), which represent human poses as structured graphs, offer a compelling alternative by explicitly modeling spatial and temporal dependencies among body joints. This survey provides a comprehensive review of GNN-based pose estimation approaches, encompassing spatial GCNs, spatiotemporal models, graph–Transformer hybrids, and hypergraph frameworks. We analyze these methods along key dimensions, including graph construction, learning paradigms, attention mechanisms, and computational efficiency, using standard benchmarks such as Human3.6 M, COCO, and MPI-INF-3DHP. Our review identifies several emerging trends and critical limitations. These include high computational cost, limited generalization to unconstrained scenarios, and inconsistent evaluation protocols. To advance the field, we outline future research directions, such as hybrid GNN–Transformer architectures, lightweight models for edge deployment, multi-modal fusion, and self-supervised learning strategies aimed at reducing annotation dependency and improving cross-domain robustness.
https://doi.org/10.1007/s44443-025-00435-2
Pose
Convolutional neural network
Graph
Artificial neural network
Generalization
Deep learning
Feature learning
Dependency (UML)
Knowledge graph
상세 정보 바로가기