EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Jiang, Jiaxi; Streli, Paul; Meier, Manuel; Holz, Christian

doi:10.1007/978-3-031-72627-9_16

Jiaxi Jiang¹³,
Paul Streli¹³,
Manuel Meier¹³ &
…
Christian Holz¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15060))

Included in the following conference series:

European Conference on Computer Vision

1178 Accesses
6 Citations

Abstract

Full-body egocentric pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representations on headset-based platforms. However, existing methods over-rely on the indoor motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous joint motion capture and uniform body dimensions. We propose EgoPoser to overcome these limitations with four main contributions. 1) EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset’s field of view. 2) We rethink input representations for headset-based ego-pose estimation and introduce a novel global motion decomposition method that predicts full-body pose independent of global positions. 3) We enhance pose estimation by capturing longer motion time series through an efficient SlowFast module design that maintains computational efficiency. 4) EgoPoser generalizes across various body shapes for different users. We experimentally evaluate our method and show that it outperforms state-of-the-art methods both qualitatively and quantitatively while maintaining a high inference speed of over 600 fps. EgoPoser establishes a robust baseline for future work where full-body pose estimation no longer needs to rely on outside-in capture and can scale to large-scale and unseen environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+

from €37.37 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (Netherlands)

eBook: EUR 60.98; Price includes VAT (Netherlands)

Softcover Book: EUR 80.65; Price includes VAT (Netherlands)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

References

CMU MoCap Dataset. (2004). http://mocap.cs.cmu.edu/
RootMotion Final IK. (2018). https://assetstore.unity.com/packages/tools/animation/final-ik-14290
Ahuja, K., Harrison, C., Goel, M., Xiao, R.: Mecap: whole-body digitization for low-cost vr/ar headsets. In: Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 453–462 (2019)
Google Scholar
Ahuja, K., Ofek, E., Gonzalez-Franco, M., Holz, C., Wilson, A.D.: Coolmoves: user motion accentuation in virtual reality. Proc. ACM Interact. Mobile Wearable Ubiquit. Technol. 5(2), 1–23 (2021)
Article Google Scholar
Ahuja, K., Shen, V., Fang, C.M., Riopelle, N., Kong, A., Harrison, C.: Controllerpose: inside-out body capture with VR controller cameras. In: CHI Conference on Human Factors in Computing Systems, pp. 1–13 (2022)
Google Scholar
Akada, H., Wang, J., Shimada, S., Takahashi, M., Theobalt, C., Golyanik, V.: UnrealEgo: a new dataset for robust egocentric 3D human motion capture. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part VI, pp. 1–17. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20068-7_1
Aliakbarian, S., Cameron, P., Bogo, F., Fitzgibbon, A., Cashman, T.J.: Flag: flow-based 3d avatar generation from sparse observations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13253–13262 (2022)
Google Scholar
Armani, R., Qian, C., Jiang, J., Holz, C.: Ultra inertial poser: scalable motion capture and tacking from sparse inertial sensors and ultra-wideband ranging. In: ACM SIGGRAPH 2024 Conference Papers (SIGGRAPH 2024). Association for Computing Machinery, New York (2024)
Google Scholar
Bailly, G., Müller, J., Rohs, M., Wigdor, D., Kratz, S.: Shoesense: a new perspective on gestural interaction and wearable applications. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1239–1248 (2012)
Google Scholar
Dittadi, A., Dziadzio, S., Cosker, D., Lundell, B., Cashman, T.J., Shotton, J.: Full-body motion from a single head-mounted device: generating SMPL poses from partial observations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11687–11697 (2021)
Google Scholar
Du, Y., Kips, R., Pumarola, A., Starke, S., Thabet, A., Sanakoyeu, A.: Avatars grow legs: generating smooth human motion from sparse tracking inputs with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar
Fender, A., Müller, J.: Velt: a framework for multi RGB-D camera systems. In: Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces, pp. 73–83 (2018)
Google Scholar
Grauman, K., et al.: Ego4d: around the world in 3,000 hours of egocentric video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18995–19012 (2022)
Google Scholar
Grauman, K., et al.: Ego-exo4d: understanding skilled human activity from first-and third-person perspectives. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19383–19400 (2024)
Google Scholar
Guzov, V., Mir, A., Sattler, T., Pons-Moll, G.: Human poseitioning system (HPS): 3d human pose estimation and self-localization in large scenes from body-mounted sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4318–4329 (2021)
Google Scholar
Han, S., et al.: Megatrack: monochrome egocentric articulated hand-tracking for virtual reality. ACM Trans. Graph. 39(4), 87–1 (2020)
Google Scholar
Han, S., et al.: Umetrack: unified multi-view end-to-end hand tracking for VR. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–9 (2022)
Google Scholar
Huang, Y., Kaufmann, M., Aksan, E., Black, M.J., Hilliges, O., Pons-Moll, G.: Deep inertial poser: learning to reconstruct human pose from sparse inertial measurements in real time. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 37, 185:1–185:15 (2018)
Google Scholar
Jiang, J., Streli, P., Luo, X., Gebhardt, C., Holz, C.: MANIKIN: biomechanically accurate neural inverse kinematics for human motion estimation. In: European Conference on Computer Vision. Springer (2024)
Google Scholar
Jiang, J., et al.: AvatarPoser: articulated full-body pose tracking from sparse motion sensing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022, Part V, pp. 443–460. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20065-6_26
Jiang, Y., Ye, Y., Gopinath, D., Won, J., Winkler, A.W., Liu, C.K.: Transformer inertial poser: real-time human motion reconstruction from sparse IMUS with simultaneous terrain generation. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–9 (2022)
Google Scholar
Kang, T., Lee, K., Zhang, J., Lee, Y.: Ego3dpose: capturing 3d cues from binocular egocentric views. In: SIGGRAPH Asia 2023 Conference Papers, pp. 1–10 (2023)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
Google Scholar
Lee, S., Starke, S., Ye, Y., Won, J., Winkler, A.: Questenvsim: environment-aware simulated motion tracking from sparse sensors. arXiv preprint arXiv:2306.05666 (2023)
Li, J., Liu, K., Wu, J.: Ego-body pose estimation via ego-head pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 17142–17151 (2023)
Google Scholar
Li, S., et al.: A mobile robot hand-arm teleoperation system by vision and IMU. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 10900–10906. IEEE (2020)
Google Scholar
Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 34(6), 1–16 (2015)
Article Google Scholar
Ma, L., et al.: Nymeria: a massive collection of multimodal egocentric daily motion in the wild. arXiv preprint arXiv:2406.09905 (2024)
Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: International Conference on Computer Vision, pp. 5442–5451 (2019)
Google Scholar
Mollyn, V., Arakawa, R., Goel, M., Harrison, C., Ahuja, K.: Imuposer: full-body pose estimation using IMUS in phones, watches, and earbuds. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2023)
Google Scholar
Müller, M., Röder, T., Clausen, M., Eberhardt, B., Krüger, B., Weber, A.: Documentation mocap database hdm05. Tech. Rep. CG-2007-2, Universität Bonn (2007)
Google Scholar
Parger, M., et al.: UNOC: understanding occlusion for embodied presence in virtual reality. IEEE Trans. Visual Comput. Graph. 28(12), 4240–4251 (2021)
Google Scholar
Ponton, J.L., Yun, H., Aristidou, A., Andujar, C., Pelechano, N.: Sparseposer: real-time full-body motion reconstruction from sparse data. ACM Trans. Graph. 43(1), 1–14 (2023)
Google Scholar
Rhodin, H., et al.: Egocap: egocentric marker-less motion capture with two fisheye cameras. ACM Trans. Graph. 35(6), 1–11 (2016)
Google Scholar
Streli, P., Armani, R., Cheng, Y.F., Holz, C.: HOOV: hand out-of-view tracking for proprioceptive interaction using inertial sensing. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–16 (2023)
Google Scholar
Troje, N.F.: Decomposing biological motion: a framework for analysis and synthesis of human gait patterns. J. Vision 2(5), 2 (2002)
Google Scholar
Van Wouwe, T., Lee, S., Falisse, A., Delp, S., Liu, C.K.: Diffusionposer: real-time human motion reconstruction from arbitrary sparse sensors using autoregressive diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2513–2523 (2024)
Google Scholar
Von Marcard, T., Rosenhahn, B., Black, M.J., Pons-Moll, G.: Sparse inertial poser: automatic 3d human pose estimation from sparse IMUS. In: Computer Graphics Forum, vol. 36, pp. 349–360. Wiley Online Library (2017)
Google Scholar
Wang, J., Liu, L., Xu, W., Sarkar, K., Theobalt, C.: Estimating egocentric 3D human pose in global space. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11500–11509 (2021)
Google Scholar
Winkler, A., Won, J., Ye, Y.: Questsim: human motion tracking from sparse sensors with simulated avatars. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–8 (2022)
Google Scholar
Wu, E., Yuan, Y., Yeo, H.S., Quigley, A., Koike, H., Kitani, K.M.: Back-hand-pose: 3D hand pose estimation for a wrist-worn camera via dorsum deformation network. In: Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pp. 1147–1160 (2020)
Google Scholar
Xie, X., Bhatnagar, B.L., Pons-Moll, G.: Visibility aware human-object interaction tracking from single RGB camera. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4757–4768 (2023)
Google Scholar
Yang, D., Kim, D., Lee, S.H.: Lobstr: real-time lower-body pose prediction from sparse upper-body tracking signals. In: Computer Graphics Forum, vol. 40, pp. 265–275. Wiley Online Library (2021)
Google Scholar
Yi, X., et al.: Egolocate: real-time motion capture, localization, and mapping with sparse body-mounted sensors. ACM Trans. Graph. 42(4), 1–17 (2023)
Google Scholar
Yi, X., et al.: Physical inertial poser (PIP): physics-aware real-time human motion tracking from sparse inertial sensors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13167–13178 (2022)
Google Scholar
Yi, X., Zhou, Y., Xu, F.: Transpose: real-time 3d human translation and pose estimation with six inertial sensors. ACM Trans. Graph. 40(4), 1–13 (2021)
Article Google Scholar
Yi, X., Zhou, Y., Xu, F.: Physical non-inertial poser (PNP): modeling non-inertial effects in sparse-inertial human motion capture. In: ACM SIGGRAPH 2024 Conference Papers, pp. 1–11 (2024)
Google Scholar
Yuan, Y., Wei, S.E., Simon, T., Kitani, K., Saragih, J.: Simpoe: simulated character control for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7159–7169 (2021)
Google Scholar
Zhao, D., Wei, Z., Mahmud, J., Frahm, J.M.: Egoglass: egocentric-view human pose estimation from an eyeglass frame. In: 2021 International Conference on 3D Vision (3DV), pp. 32–41. IEEE (2021)
Google Scholar
Zheng, X., Su, Z., Wen, C., Xue, Z., Jin, X.: Realistic full-body tracking from sparse observations via joint-level modeling. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14678–14688 (2023)
Google Scholar
Zhou, Y., Barnes, C., Lu, J., Yang, J., Li, H.: On the continuity of rotation representations in neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5745–5753 (2019)
Google Scholar

Download references

Acknowledgement

We sincerely thank Andreas Fender for his help with data recording, testing, and manuscript proofreading.

Author information

Authors and Affiliations

Department of Computer Science, ETH Zürich, Zürich, Switzerland
Jiaxi Jiang, Paul Streli, Manuel Meier & Christian Holz

Authors

Jiaxi Jiang
View author publications
Search author on:PubMed Google Scholar
Paul Streli
View author publications
Search author on:PubMed Google Scholar
Manuel Meier
View author publications
Search author on:PubMed Google Scholar
Christian Holz
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jiaxi Jiang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, J., Streli, P., Meier, M., Holz, C. (2025). EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15060. Springer, Cham. https://doi.org/10.1007/978-3-031-72627-9_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-72627-9_16
Published: 20 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72626-2
Online ISBN: 978-3-031-72627-9
eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Keywords

Publish with us

Profiles

Subscribe and save

Buy Now

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices

UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture

AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

Explore related subjects

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Keywords

Publish with us

Profiles