{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T21:31:22Z","timestamp":1767994282773,"version":"3.49.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,6,6]],"date-time":"2022-06-06T00:00:00Z","timestamp":1654473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation","award":["DBI-1707408"],"award-info":[{"award-number":["DBI-1707408"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            Ever-growing edge applications often require short processing latency and high energy efficiency to meet strict timing and power budget. In this work, we propose that the compact long short-term memory (LSTM) model can approximate conventional\n            <jats:italic>acausal<\/jats:italic>\n            algorithms with reduced latency and improved efficiency for real-time causal prediction, especially for the neural signal processing in closed-loop feedback applications. We design an LSTM inference accelerator by taking advantage of the fine-grained parallelism and pipelined feedforward and recurrent updates. We also propose a bit-sparse quantization method that can reduce the circuit area and power consumption by replacing the multipliers with the bit-shift operators. We explore different combinations of pruning and quantization methods for energy-efficient LSTM inference on datasets collected from the electroencephalogram (EEG) and calcium image processing applications. Evaluation results show that our proposed LSTM inference accelerator can achieve 1.19 GOPS\/mW energy efficiency. The LSTM accelerator with 2-sbit\/16-bit sparse quantization and 60% sparsity can reduce the circuit area and power consumption by 54.1% and 56.3%, respectively, compared with a 16-bit baseline implementation.\n          <\/jats:p>","DOI":"10.1145\/3495006","type":"journal-article","created":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T17:13:41Z","timestamp":1645722821000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Energy-Efficient LSTM Inference Accelerator for Real-Time Causal Prediction"],"prefix":"10.1145","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5371-2058","authenticated-orcid":false,"given":"Zhe","family":"Chen","sequence":"first","affiliation":[{"name":"Computer Science Department, UCLA, Los Angeles, CA, USA"}]},{"given":"Hugh T.","family":"Blair","sequence":"additional","affiliation":[{"name":"Department of Psychology, UCLA, Los Angeles, CA, USA"}]},{"given":"Jason","family":"Cong","sequence":"additional","affiliation":[{"name":"Computer Science Department, UCLA, Los Angeles, CA, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-018-0266-x"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780195301069.001.0001"},{"key":"e_1_3_2_4_2","first-page":"1","volume-title":"IEEE Int. Symp. Circuits Syst. (ISCAS)","author":"Chang Andre Xian Ming","year":"2017","unstructured":"Andre Xian Ming Chang and Eugenio Culurciello. 2017. Hardware accelerators for recurrent neural networks on FPGA. In IEEE Int. Symp. Circuits Syst. (ISCAS). 1\u20134."},{"key":"e_1_3_2_5_2","first-page":"259","volume-title":"IEEE Eur. Solid State Circuits Conf. (ESSCC)","author":"Chen Chixiao","year":"2017","unstructured":"Chixiao Chen, Hongwei Dong, Huwan Peng, Haozhe Zhu, Rui Ma, Peiyong Zhang, Xiaolang Yan, Yu Wang, Mingyu Wang, Hao Min, and Richard C.-J. Shi. 2017. OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators. In IEEE Eur. Solid State Circuits Conf. (ESSCC). 259\u2013262."},{"key":"e_1_3_2_6_2","first-page":"217","volume-title":"Proc. Int. Symp. Low Power Electron. Des. (ISLPED)","author":"Chen Zhe","year":"2020","unstructured":"Zhe Chen, Garrett J. Blair, Hugh T. Blair, and Jason Cong. 2020. BLINK: Bit-sparse LSTM inference kernel enabling efficient calcium trace extraction for neurofeedback devices. In Proc. Int. Symp. Low Power Electron. Des. (ISLPED). 217\u2013222."},{"key":"e_1_3_2_7_2","first-page":"104","volume-title":"Proc. Int. Symp. Field-Programmable Gate Arrays (FPGA)","author":"Chen Zhe","year":"2019","unstructured":"Zhe Chen, Hugh T. Blair, and Jason Cong. 2019. LANMC: LSTM-assisted non-rigid motion correction on FPGA for calcium image stabilization. In Proc. Int. Symp. Field-Programmable Gate Arrays (FPGA). ACM, 104\u2013109."},{"key":"e_1_3_2_8_2","first-page":"2:1\u20132:6","volume-title":"Proc. Int. Symp. Low Power Electron. Des. (ISLPED)","author":"Chen Zhe","year":"2018","unstructured":"Zhe Chen, Andrew Howe, Hugh T. Blair, and Jason Cong. 2018. CLINK: Compact LSTM inference kernel for energy efficient neurofeedback devices. In Proc. Int. Symp. Low Power Electron. Des. (ISLPED). 2:1\u20132:6."},{"key":"e_1_3_2_9_2","first-page":"1","volume-title":"IEEE Custom Integrated Circuits Conf. (CICC\u201918)","author":"Conti Francesco","year":"2018","unstructured":"Francesco Conti, Lukas Cavigelli, Gianna Paulin, Igor Susmelj, and Luca Benini. 2018. CHIPMUNK: A systolically scalable 0.9 mm 2, 3.08Gop\/s\/mW @ 1.2 mW accelerator for near-sensor recurrent neural network inference. In IEEE Custom Integrated Circuits Conf. (CICC\u201918). 1\u20134."},{"key":"e_1_3_2_10_2","doi-asserted-by":"crossref","unstructured":"Chang Gao Tobi Delbruck and Shih-Chii Liu. 2021. Spartus: A 9.4 TOp\/s FPGA-based LSTM accelerator exploiting spatio-temporal sparsity. (2021). arxiv:2108.02297.","DOI":"10.1109\/TNNLS.2022.3180209"},{"key":"e_1_3_2_11_2","first-page":"21","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA)","author":"Gao Chang","year":"2018","unstructured":"Chang Gao, Daniel Neil, Enea Ceolini, Shih-Chii Liu, and Tobi Delbruck. 2018. DeltaRNN: A power-efficient recurrent neural network accelerator. In Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA). ACM, 21\u201330."},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth.1694"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.38173"},{"key":"e_1_3_2_14_2","first-page":"2378","volume-title":"Adv. Neural Inf. Process. Syst. (NIPS)","author":"Giovannucci Andrea","year":"2017","unstructured":"Andrea Giovannucci, Johannes Friedrich, Matthew Kaufman, Anne K. Churchland, Dmitri Chklovskii, and Liam Paninski. 2017. OnACID: Online analysis of calcium imaging data in real time. In Adv. Neural Inf. Process. Syst. (NIPS). 2378\u20132388."},{"key":"e_1_3_2_15_2","first-page":"166","volume-title":"IEEE Eur. Solid State Circuits Conf. (ESSCIRC)","author":"Giraldo J. S. P.","year":"2018","unstructured":"J. S. P. Giraldo and Marian Verhelst. 2018. Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65 nm CMOS. In IEEE Eur. Solid State Circuits Conf. (ESSCIRC). 166\u2013169."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2008.137"},{"key":"e_1_3_2_17_2","first-page":"6645","volume-title":"IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP)","author":"Graves Alex","year":"2013","unstructured":"Alex Graves, Abdel Rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP). 6645\u20136649."},{"key":"e_1_3_2_18_2","first-page":"152","volume-title":"IEEE 25th Annu. Int. Symp. Field-Programmable Cust. Comput. Mach. (FCCM)","author":"Guan Yijin","year":"2017","unstructured":"Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, and Jason Cong. 2017. FP-DNN: An automated framework for mapping deep neural networks onto FPGAs with RTL-HLS hybrid templates. In IEEE 25th Annu. Int. Symp. Field-Programmable Cust. Comput. Mach. (FCCM). 152\u2013159."},{"key":"e_1_3_2_19_2","first-page":"629","volume-title":"22nd Asia South Pacific Des. Autom. Conf. (ASP-DAC)","author":"Guan Yijin","year":"2017","unstructured":"Yijin Guan, Zhihang Yuan, Guangyu Sun, and Jason Cong. 2017. FPGA-based accelerator for long short-term memory recurrent neural networks. In 22nd Asia South Pacific Des. Autom. Conf. (ASP-DAC). 629\u2013634."},{"key":"e_1_3_2_20_2","first-page":"75","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA)","author":"Han Song","year":"2017","unstructured":"Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient speech recognition engine with sparse LSTM on FPGA. In Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA). 75\u201384."},{"key":"e_1_3_2_21_2","volume-title":"Proc. Int. Symp. Comput. Archit. (ISCA)","author":"Han Song","year":"2016","unstructured":"Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proc. Int. Symp. Comput. Archit. (ISCA). 243\u2013254."},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.5555\/3019264"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2020.2992900"},{"key":"e_1_3_2_25_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41592-019-0493-9"},{"key":"e_1_3_2_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/MEMB.2005.1511503"},{"key":"e_1_3_2_27_2","first-page":"218","volume-title":"IEEE Int. Solid-State Circuits Conf. (ISSCC)","author":"Lee Jinmook","year":"2018","unstructured":"Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2018. UNPU: A 50.6TOPS\/W unified deep neural network accelerator with 1b-to-16b fully-variable weight bit-precision. In IEEE Int. Solid-State Circuits Conf. (ISSCC). 218\u2013220."},{"key":"e_1_3_2_28_2","first-page":"237","volume-title":"IEEE Asian Solid-State Circuits Conf. (ASSCC)","author":"Lee Jinmook","year":"2017","unstructured":"Jinmook Lee, Dongjoo Shin, and Hoi-Jun Yoo. 2017. A 21mW low-power recurrent neural network accelerator with quantization tables for embedded deep learning applications. In IEEE Asian Solid-State Circuits Conf. (ASSCC). 237\u2013240."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.celrep.2018.05.062"},{"key":"e_1_3_2_30_2","first-page":"e1098","article-title":"Micro-drive array for chronic in vivo recording: Tetrode assembly.","volume":"26","author":"Nguyen David P.","year":"2009","unstructured":"David P. Nguyen, Stuart P. Layton, Gregory Hale, Stephen N. Gomperts, Thomas J. Davidson, Fabian Kloosterman, and Matthew A. Wilson. 2009. Micro-drive array for chronic in vivo recording: Tetrode assembly.J. Vis. Exp. 26 (2009), e1098.","journal-title":"J. Vis. Exp."},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jneumeth.2017.07.031"},{"key":"e_1_3_2_32_2","first-page":"20","volume-title":"Int. Conf. Field-Programmable Technol. (FPT)","author":"Que Zhiqiang","year":"2020","unstructured":"Zhiqiang Que, Hiroki Nakahara, Hongxiang Fan, Jiuxi Meng, Kuen Huang Tsoi, Xinyu Niu, Eriko Nurvitadhi, and Wayne Luk. 2020. A reconfigurable multithreaded accelerator for recurrent neural networks. In Int. Conf. Field-Programmable Technol. (FPT). 20\u201328."},{"key":"e_1_3_2_33_2","doi-asserted-by":"crossref","unstructured":"Vladimir Rybalkin Alessandro Pappalardo Muhammad Mohsin Ghaffar Giulio Gambardella Norbert Wehn and Michaela Blott. 2018. FINN-L: Library extensions and design trade-off analysis for variable precision LSTM networks on FPGAs. In Int. Conf. Field-Programmable Log. Appl. (FPL) . 890\u2013897.","DOI":"10.1109\/FPL.2018.00024"},{"key":"e_1_3_2_34_2","first-page":"1390","volume-title":"Des. Autom. Test Eur. Conf. Exhib. (DATE)","author":"Rybalkin Vladimir","year":"2017","unstructured":"Vladimir Rybalkin, Norbert Wehn, Mohammad Reza Yousefi, and Didier Stricker. 2017. Hardware architecture of bidirectional long short-term memory neural network for optical character recognition. In Des. Autom. Test Eur. Conf. Exhib. (DATE). 1390\u20131395."},{"key":"e_1_3_2_35_2","first-page":"240","volume-title":"IEEE Int. Solid-State Circuits Conf. (ISSCC)","author":"Shin Dongjoo","year":"2017","unstructured":"Dongjoo Shin, Jinmook Lee, Jinsu Lee, and Hoi-Jun Yoo. 2017. DNPU: An 8.1TOPS\/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In IEEE Int. Solid-State Circuits Conf. (ISSCC). 240\u2013241."},{"key":"e_1_3_2_36_2","first-page":"11","volume-title":"Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA)","author":"Wang Shuo","year":"2018","unstructured":"Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Qinru Qiu, Yanzhi Wang, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. In Proc. ACM\/SIGDA Int. Symp. Field-Programmable Gate Arrays (FPGA). 11\u201320."},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TVLSI.2017.2717950"},{"key":"e_1_3_2_38_2","first-page":"1","volume-title":"Int. Conf. Field-Programable Log. Appl. (FPL)","author":"Zhang Xiaofan","year":"2017","unstructured":"Xiaofan Zhang, Xinheng Liu, Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, and Deming Chen. 2017. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Int. Conf. Field-Programable Log. Appl. (FPL). 1\u20134."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.7554\/eLife.28728"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3495006","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3495006","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:19Z","timestamp":1750193359000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3495006"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,6]]},"references-count":38,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3495006"],"URL":"https:\/\/doi.org\/10.1145\/3495006","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,6]]},"assertion":[{"value":"2021-07-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}