面向具有几何先验的实时准确单目3D人体姿态估计框架

2507.16850v1

中文标题#

面向具有几何先验的实时准确单目 3D 人体姿态估计框架

英文标题#

Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors

中文摘要#

单目 3D 人体姿态估计仍然是一个具有挑战性和病态的问题，尤其是在实时设置和非受控环境中。虽然直接的图像到 3D 方法需要大量注释数据和重型模型，但 2D 到 3D 提升提供了一种更轻量和灵活的替代方案 —— 尤其是当结合先验知识时。在本工作中，我们提出了一种框架，将实时 2D 关键点检测与几何感知的 2D 到 3D 提升相结合，显式利用已知的相机内部参数和特定于受试者的解剖先验。我们的方法基于自校准和生物力学约束逆运动学的最新进展，从动作捕捉和合成数据集中生成大规模、合理的 2D-3D 训练对。我们讨论了这些要素如何使从单目图像中快速、个性化和准确地进行 3D 姿态估计成为可能，而无需专用硬件。该提案旨在促进关于弥合数据驱动学习和基于模型的先验以提高边缘设备上 3D 人体运动捕捉的准确性、可解释性和可部署性的讨论。

英文摘要#

Monocular 3D human pose estimation remains a challenging and ill-posed problem, particularly in real-time settings and unconstrained environments. While direct imageto-3D approaches require large annotated datasets and heavy models, 2D-to-3D lifting offers a more lightweight and flexible alternative-especially when enhanced with prior knowledge. In this work, we propose a framework that combines real-time 2D keypoint detection with geometry-aware 2D-to-3D lifting, explicitly leveraging known camera intrinsics and subject-specific anatomical priors. Our approach builds on recent advances in self-calibration and biomechanically-constrained inverse kinematics to generate large-scale, plausible 2D-3D training pairs from MoCap and synthetic datasets. We discuss how these ingredients can enable fast, personalized, and accurate 3D pose estimation from monocular images without requiring specialized hardware. This proposal aims to foster discussion on bridging data-driven learning and model-based priors to improve accuracy, interpretability, and deployability of 3D human motion capture on edge devices in the wild.

PDF 获取#

查看中文 PDF - 2507.16850v1

智能达人抖店二维码

抖音扫码查看更多精彩内容