{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T19:07:18Z","timestamp":1774033638805,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,4,17]],"date-time":"2020-04-17T00:00:00Z","timestamp":1587081600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Honda Research Institute Europe"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2020,6,30]]},"abstract":"<jats:p>Data for sign language research is often difficult and costly to acquire. We therefore present a novel pipeline able to generate motion three-dimensional (3D) skeleton data from single-camera sign language videos only. First, three recurrent neural networks are learned to infer the three-dimensional position data of body, face, and finger joints for a high resolution of the signer\u2019s skeleton. Subsequently, the angular displacements of all joints over time are estimated using inverse kinematics and mapped to a virtual sign avatar for animation. Last, the generated data are evaluated in detail, including a sign language recognition and sign language synthesis scenario. Utilizing a neural word classifier trained on real motion capture data, we reliably classify word segments built from our newly generated position data with similar accuracy as motion capture data (absolute difference 3.8%). Furthermore, qualitative evaluation of sign animations shows that the avatar performs natural movements that are comprehensible and resemble animations created with original motion capture data.<\/jats:p>","DOI":"10.1145\/3377552","type":"journal-article","created":{"date-parts":[[2020,5,4]],"date-time":"2020-05-04T07:06:58Z","timestamp":1588576018000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Learning Three-dimensional Skeleton Data from Sign Language Video"],"prefix":"10.1145","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9530-4714","authenticated-orcid":false,"given":"Heike","family":"Brock","sequence":"first","affiliation":[{"name":"Honda Research Institute, Saitama, Japan"}]},{"given":"Felix","family":"Law","sequence":"additional","affiliation":[{"name":"University of British Columbia, Vancouver, Canada"}]},{"given":"Kazuhiro","family":"Nakadai","sequence":"additional","affiliation":[{"name":"Honda Research Institute, Saitama, Japan"}]},{"given":"Yuji","family":"Nagashima","sequence":"additional","affiliation":[{"name":"Kougakuin University, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2020,4,17]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E. Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E. Hinton . 2016 . Layer normalization. arXiv preprint arXiv:1607.06450 (2016). Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2018.8525717"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918)","author":"Brock Heike","year":"2018","unstructured":"Heike Brock and Kazuhiro Nakadai . 2018 . Deep JSLC: A multimodal corpus collection for data-driven generation of Japanese sign language expressions . In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918) . European Language Resources Association (ELRA). Heike Brock and Kazuhiro Nakadai. 2018. Deep JSLC: A multimodal corpus collection for data-driven generation of Japanese sign language expressions. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918). European Language Resources Association (ELRA)."},{"key":"e_1_2_1_4_1","unstructured":"Heike Brock Juliette Rengot and Kazuhiro Nakadai. 2018. Augmenting sparse corpora for enhanced sign language recognition and generation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) and the 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community. European Language Resources Association (ELRA).  Heike Brock Juliette Rengot and Kazuhiro Nakadai. 2018. Augmenting sparse corpora for enhanced sign language recognition and generation. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018) and the 8th Workshop on the Representation and Processing of Sign Languages: Involving the Language Community. European Language Resources Association (ELRA)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-011-0480-9"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.332"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0672-6"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00052"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/11919476_50"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2008.4543616"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10209-015-0408-1"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2007.383346"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Forster Jens","year":"2014","unstructured":"Jens Forster , Christoph Schmidt , Oscar Koller , Martin Bellgardt , and Hermann Ney . 2014 . Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-weather . In Proceedings of the International Conference on Language Resources and Evaluation (LREC\u201914) . 1911--1916. Jens Forster, Christoph Schmidt, Oscar Koller, Martin Bellgardt, and Hermann Ney. 2014. Extensions of the sign language recognition and translation corpus RWTH-PHOENIX-weather. In Proceedings of the International Conference on Language Resources and Evaluation (LREC\u201914). 1911--1916."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2910674.2910716"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cvi:20080006"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1823738.1823740"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11903"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1414471.1414496"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10209-007-0095-7"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1361203.1361206"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.248"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.117"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1196"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3046787"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23974-8_13"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049536.2049557"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-018-1121-3"},{"key":"e_1_2_1_31_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_32_1","volume-title":"Kuipers et al","author":"Jack","year":"1999","unstructured":"Jack B. Kuipers et al . 1999 . Quaternions and Rotation Sequences. Vol. 66 . Princeton University Press , Princeton, NJ. Jack B. Kuipers et al. 1999. Quaternions and Rotation Sequences. Vol. 66. Princeton University Press, Princeton, NJ."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2018.2817179"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 3rd International Symposium on Sign Language Translation and Avatar Technology (SLTAT\u201913)","author":"Lefebvre-Albaret Fran\u00e7ois","year":"2013","unstructured":"Fran\u00e7ois Lefebvre-Albaret , Sylvie Gibet , Ahmed Turki , Ludovic Hamon , and R\u00e9mi Brun . 2013 . Overview of the Sign3D project high-fidelity 3D recording, indexing and editing of French sign language content . In Proceedings of the 3rd International Symposium on Sign Language Translation and Avatar Technology (SLTAT\u201913) 2013. Fran\u00e7ois Lefebvre-Albaret, Sylvie Gibet, Ahmed Turki, Ludovic Hamon, and R\u00e9mi Brun. 2013. Overview of the Sign3D project high-fidelity 3D recording, indexing and editing of French sign language content. In Proceedings of the 3rd International Symposium on Sign Language Translation and Avatar Technology (SLTAT\u201913) 2013."},{"key":"e_1_2_1_35_1","volume-title":"Chan","author":"Li Sijin","year":"2014","unstructured":"Sijin Li and Antoni B . Chan . 2014 . 3d human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the Asian Conference on Computer Vision. Springer , 332--347. Sijin Li and Antoni B. Chan. 2014. 3d human pose estimation from monocular images with deep convolutional neural network. In Proceedings of the Asian Conference on Computer Vision. Springer, 332--347."},{"key":"e_1_2_1_36_1","volume-title":"Juhyun Lee, et al.","author":"Lugaresi Camillo","year":"2019","unstructured":"Camillo Lugaresi , Jiuqiang Tang , Hadon Nash , Chris McClanahan , Esha Uboweja , Michael Hays , Fan Zhang , Chuo-Ling Chang , Ming Guang Yong , Juhyun Lee, et al. 2019 . MediaPipe : A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019). Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, et al. 2019. MediaPipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019)."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE International Conference on Computer Vision. 2640--2649","author":"Martinez Julieta","unstructured":"Julieta Martinez , Rayat Hossain , Javier Romero , and James J. Little . 2017. A simple yet effective baseline for 3d human pose estimation . In Proceedings of the IEEE International Conference on Computer Vision. 2640--2649 . Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little. 2017. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision. 2640--2649."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10209-015-0407-2"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073596"},{"key":"e_1_2_1_40_1","first-page":"241","article-title":"Motion capture file formats explained. Department of Computer Science","volume":"211","author":"Meredith Maddock","year":"2001","unstructured":"Maddock Meredith , Steve Maddock , 2001 . Motion capture file formats explained. Department of Computer Science , University of Sheffield 211 (2001), 241 -- 244 . Maddock Meredith, Steve Maddock, et al. 2001. Motion capture file formats explained. Department of Computer Science, University of Sheffield 211 (2001), 241--244.","journal-title":"University of Sheffield"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1006\/cviu.2000.0897"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISM.2016.0063"},{"key":"e_1_2_1_43_1","first-page":"33","article-title":"The uncanny valley","volume":"7","author":"Mori Masahiro","year":"1970","unstructured":"Masahiro Mori . 1970 . The uncanny valley . Energy 7 , 4 (1970), 33 -- 35 . Masahiro Mori. 1970. The uncanny valley. Energy 7, 4 (1970), 33--35.","journal-title":"Energy"},{"key":"e_1_2_1_44_1","unstructured":"The University of Wisconsin. 1999. Biovision BVH. Retrieved from http:\/\/research.cs.wisc.edu\/graphics\/Courses\/cs-838-1999\/Jeff\/BVH.html.  The University of Wisconsin. 1999. Biovision BVH. Retrieved from http:\/\/research.cs.wisc.edu\/graphics\/Courses\/cs-838-1999\/Jeff\/BVH.html."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.139"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2016.09.002"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2579698"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2003.1227801"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.494"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/1753326.1753481"},{"key":"e_1_2_1_51_1","volume-title":"Le","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever , Oriol Vinyals , and Quoc V . Le . 2014 . Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems . 3104--3112. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.425"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377552","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3377552","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:18Z","timestamp":1750199598000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,17]]},"references-count":52,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,6,30]]}},"alternative-id":["10.1145\/3377552"],"URL":"https:\/\/doi.org\/10.1145\/3377552","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,17]]},"assertion":[{"value":"2019-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}