幾何的先験に基づくリアルタイム正確単眼3D人体姿勢推定フレームワーク

2507.16850v1

日本語タイトル#

幾何学的先験を用いたリアルタイムでの正確な単眼 3D 人体姿勢推定フレームワーク

英文タイトル#

Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors

日本語摘要#

単眼 3D 人体姿勢推定は、特にリアルタイム設定や非制約環境において、依然として挑戦的で病的な問題です。直接的な画像から 3D へのアプローチは、大量の注釈データと重いモデルを必要としますが、2D から 3D へのリフティングは、特に先験知識と組み合わせることで、より軽量で柔軟な代替手段を提供します。本研究では、リアルタイムの 2D キーポイント検出と幾何学的に認識された 2D から 3D へのリフティングを組み合わせたフレームワークを提案し、既知のカメラ内部パラメータと被験者特有の解剖学的先験を明示的に活用します。我々のアプローチは、自己キャリブレーションと生物力学的制約を受けた逆運動学の最近の進展に基づいており、モーションキャプチャーと合成データセットから大規模で妥当な 2D-3D トレーニングペアを生成します。これらの要素が、専用ハードウェアを必要とせずに単眼画像から迅速で個別化された正確な 3D 姿勢推定を可能にする方法について議論します。この提案は、データ駆動型学習とモデルベースの先験を橋渡しし、エッジデバイス上での 3D 人体動作キャプチャの精度、解釈可能性、展開可能性を向上させるための議論を促進することを目的としています。

英文摘要#

Monocular 3D human pose estimation remains a challenging and ill-posed problem, particularly in real-time settings and unconstrained environments. While direct imageto-3D approaches require large annotated datasets and heavy models, 2D-to-3D lifting offers a more lightweight and flexible alternative-especially when enhanced with prior knowledge. In this work, we propose a framework that combines real-time 2D keypoint detection with geometry-aware 2D-to-3D lifting, explicitly leveraging known camera intrinsics and subject-specific anatomical priors. Our approach builds on recent advances in self-calibration and biomechanically-constrained inverse kinematics to generate large-scale, plausible 2D-3D training pairs from MoCap and synthetic datasets. We discuss how these ingredients can enable fast, personalized, and accurate 3D pose estimation from monocular images without requiring specialized hardware. This proposal aims to foster discussion on bridging data-driven learning and model-based priors to improve accuracy, interpretability, and deployability of 3D human motion capture on edge devices in the wild.

PDF 获取#

查看中文 PDF - 2507.16850v1

スマート達人抖店 QR コード

抖音でさらに素晴らしいコンテンツを見るには QR コードをスキャンしてください