{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T17:20:21Z","timestamp":1774718421787,"version":"3.50.1"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","funder":[{"name":"Global Research Support Program in the Digital Field program","award":["RS-2024-00431397"],"award-info":[{"award-number":["RS-2024-00431397"]}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"crossref","award":["RS-2023-00208046"],"award-info":[{"award-number":["RS-2023-00208046"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This article presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it performs an approximation-aware neural architecture search (NAS) to tune the learnable parameters used during approximation. Evaluations with Fashion-MNIST, CIFAR-10, and MARS, an open-source human pose estimation dataset, show eMamba achieves comparable accuracy to state-of-the-art techniques using 1.63\u201319.9\u00d7 fewer parameters. In addition, it generalizes well to large-scale natural language tasks, demonstrating stable perplexity across varying sequence lengths on the WikiText2 dataset. We also quantize and implement the entire eMamba pipeline on an AMD ZCU102 FPGA and ASIC using GlobalFoundries (GF) 22\u00a0nm technology. Experimental results show 4.95\u20135.62\u00d7 lower latency and 2.22\u20139.95\u00d7 higher throughput, with 4.77\u00d7 smaller area, 9.84\u00d7 lower power, and 48.6\u00d7 lower energy consumption than baseline solutions while maintaining competitive accuracy.<\/jats:p>","DOI":"10.1145\/3762190","type":"journal-article","created":{"date-parts":[[2025,8,19]],"date-time":"2025-08-19T11:28:08Z","timestamp":1755602888000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-6857-755X","authenticated-orcid":false,"given":"Jiyong","family":"Kim","sequence":"first","affiliation":[{"name":"Department of Electrical, Electronic and Computer Engineering, University of Ulsan","place":["Ulsan, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5434-7791","authenticated-orcid":false,"given":"Jaeho","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronic and Computer Engineering, University of Ulsan","place":["Ulsan, Korea (the Republic of)"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3618-2385","authenticated-orcid":false,"given":"Jiahao","family":"Lin","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison","place":["Madison, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-8585-9241","authenticated-orcid":false,"given":"Alish","family":"Kanani","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Wisconsin-Madison","place":["Madison, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4537-6998","authenticated-orcid":false,"given":"Sun","family":"Miao","sequence":"additional","affiliation":[{"name":"University of Wisconsin-Madison","place":["Madison, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5045-5535","authenticated-orcid":false,"given":"Umit","family":"Ogras","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, University of Wisconsin-Madison","place":["Madison, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2276-4998","authenticated-orcid":false,"given":"Jaehyun","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Electrical, Electronic and Computer Engineering, University of Ulsan","place":["Ulsan, Korea (the Republic of)"]}]}],"member":"320","published-online":{"date-parts":[[2025,9,26]]},"reference":[{"key":"e_1_3_1_2_2","volume-title":"Fundamentals of Electric Circuits","author":"Alexander Charles K","year":"2007","unstructured":"Charles K Alexander, Matthew NO Sadiku, and Matthew Sadiku. 2007. Fundamentals of Electric Circuits. McGraw-Hill Higher Education Boston, MA, USA."},{"key":"e_1_3_1_3_2","unstructured":"AMD Inc.2023. ZCU102 Evaluation Board User Guide (UG1182). (2023). Retrieved from https:\/\/docs.amd.com\/v\/u\/en-US\/ug1182-zcu102-eval-bd"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3477003"},{"key":"e_1_3_1_5_2","unstructured":"Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:1607.06450"},{"key":"e_1_3_1_6_2","volume-title":"Proceedings of theAdvances in Neural Information Processing Systems","volume":"31","author":"Banner Ron","year":"2018","unstructured":"Ron Banner, Itay Hubara, Elad Hoffer, and Daniel Soudry. 2018. Scalable methods for 8-bit training of neural networks. In Proceedings of theAdvances in Neural Information Processing Systems, Vol. 31. Curran Associates, Inc."},{"key":"e_1_3_1_7_2","volume-title":"Modern Control Systems","author":"Bishop Richard C Dorf Robert H","year":"2011","unstructured":"Richard C Dorf Robert H Bishop. 2011. Modern Control Systems."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242897"},{"key":"e_1_3_1_9_2","first-page":"4396","article-title":"Quip: 2-bit quantization of large language models with guarantees","volume":"36","author":"Chee Jerry","year":"2023","unstructured":"Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, and Christopher M De Sa. 2023. Quip: 2-bit quantization of large language models with guarantees. Advances in Neural Information Processing Systems 36 (2023), 4396\u20134429.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eng.2020.01.007"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001177"},{"key":"e_1_3_1_12_2","unstructured":"Hung-Yueh Chiang Chi-Chih Chang Natalia Frumkin Kai-Chiang Wu and Diana Marculescu. 2025. Quamba: A post-training quantization recipe for selective state space models. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3539224"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.joule.2023.09.004"},{"key":"e_1_3_1_15_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Dosovitskiy Alexey","year":"2021","unstructured":"Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_16_2","unstructured":"Daniel Y. Fu Tri Dao Khaled Kamal Saab Armin W. Thomas Atri Rudra and Christopher R\u00e9. 2023. Hungry hungry hippos: Towards language modeling with state space models. In Proceedings of the 11th International Conference on Learning Representations (ICLR 2023)."},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-20870-7_7"},{"key":"e_1_3_1_18_2","volume-title":"Proceedings of the 1st Conference on Language Modeling","author":"Gu Albert","year":"2024","unstructured":"Albert Gu and Tri Dao. 2024. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the 1st Conference on Language Modeling."},{"key":"e_1_3_1_19_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Gu Albert","year":"2022","unstructured":"Albert Gu, Karan Goel, and Christopher Re. 2022. Efficiently modeling long sequences with structured state spaces. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_20_2","first-page":"572","article-title":"Combining recurrent, convolutional, and continuous-time models with linear state space layers","volume":"34","author":"Gu Albert","year":"2021","unstructured":"Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher R\u00e9. 2021. Combining recurrent, convolutional, and continuous-time models with linear state space layers. Advances in Neural Information Processing Systems 34 (2021), 572\u2013585.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_21_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Gu Albert","year":"2023","unstructured":"Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher Re. 2023. How to train your HIPPO: State space models with generalized orthogonal basis projections. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2021.3114179"},{"key":"e_1_3_1_23_2","unstructured":"Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. Technical Report. University of Toronto Toronto ON Canada."},{"key":"e_1_3_1_24_2","doi-asserted-by":"crossref","unstructured":"Jinhao Li Shan Huang Jiaming Xu Jun Liu Li Ding Ningyi Xu and Guohao Dai. 2024. Marca: Mamba accelerator with reconfigurable architecture. In Proceedings of the 43rd IEEE\/ACM International Conference on Computer-Aided Design. 1\u20139.","DOI":"10.1145\/3676536.3676798"},{"key":"e_1_3_1_25_2","first-page":"34451","article-title":"Q-vit: Accurate and fully quantized low-bit vision transformer","volume":"35","author":"Li Yanjing","year":"2022","unstructured":"Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, and Guodong Guo. 2022. Q-vit: Accurate and fully quantized low-bit vision transformer. Advances in Neural Information Processing Systems 35 (2022), 34451\u201334463.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01565"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Yang Lin Tianyu Zhang Peiqin Sun Zheng Li and Shuchang Zhou. 2022. FQ-ViT: Post-training quantization for fully quantized vision transformer. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI-22). 1173\u20131179.","DOI":"10.24963\/ijcai.2022\/164"},{"key":"e_1_3_1_28_2","first-page":"28092","article-title":"Post-training quantization for vision transformer","volume":"34","author":"Liu Zhenhua","year":"2021","unstructured":"Zhenhua Liu, Yunhe Wang, Kai Han, Wei Zhang, Siwei Ma, and Wen Gao. 2021. Post-training quantization for vision transformer. Advances in Neural Information Processing Systems 34 (2021), 28092\u201328103.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_29_2","unstructured":"Stephen Merity Caiming Xiong James Bradbury and Richard Socher. 2017. Pointer sentinel mixture models. In Proceedings of the International Conference on Learning Representations (ICLR 2017)."},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS46773.2023.10181988"},{"key":"e_1_3_1_32_2","unstructured":"Markus Nagel Marios Fournarakis Rana Ali Amjad Yelysei Bondarenko Mart Van Baalen and Tijmen Blankevoort. 2021. A white paper on neural network quantization. arXiv:2106.08295. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2106.08295. (2021)."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","unstructured":"Alessandro Pappalardo. 2023. Xilinx\/brevitas. (2023). DOI:10.5281\/zenodo.3333552","DOI":"10.5281\/zenodo.3333552"},{"key":"e_1_3_1_34_2","first-page":"363","article-title":"OPTIMUS: OPTImized matrix MUltiplication structure for transformer neural network accelerator","volume":"2","author":"Park Junki","year":"2020","unstructured":"Junki Park, Hyunsung Yoon, Daehyun Ahn, Jungwook Choi, and Jae-Joon Kim. 2020. OPTIMUS: OPTImized matrix MUltiplication structure for transformer neural network accelerator. Proceedings of Machine Learning and Systems 2 (March2020), 363\u2013378.","journal-title":"Proceedings of Machine Learning and Systems"},{"key":"e_1_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Albert Reuther Peter Michaleas Michael Jones Vijay Gadepally Siddharth Samsi and Jeremy Kepner. 2022. AI and ML accelerator survey and trends. In 2022 IEEE High Performance Extreme Computing Conference (HPEC). IEEE 1\u201310.","DOI":"10.1109\/HPEC55821.2022.9926331"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPS-ISA58951.2023.00018"},{"key":"e_1_3_1_37_2","doi-asserted-by":"crossref","unstructured":"Hasim Sak Andrew W. Senior and Fran\u00e7oise Beaufays. 2014. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Interspeech 2014 (2014) 338\u2013342.","DOI":"10.21437\/Interspeech.2014-80"},{"key":"e_1_3_1_38_2","doi-asserted-by":"crossref","unstructured":"Ananda Samajdar Jan Moritz Joseph Yuhao Zhu Paul Whatmough Matthew Mattina and Tushar Krishna. 2020. A systematic methodology for characterizing scalability of DNN accelerators using SCALE-Sim. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 58\u201368.","DOI":"10.1109\/ISPASS48437.2020.00016"},{"key":"e_1_3_1_39_2","unstructured":"Shivanshu Shekhar Tanishq Dubey Koyel Mukherjee Apoorv Saxena Atharv Tyagi and Nishanth Kotla. 2024. Towards optimizing the costs of LLM usage. arXiv:2402.01742. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2402.01742"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2017.47"},{"key":"e_1_3_1_41_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Smith Jimmy T.H.","year":"2023","unstructured":"Jimmy T.H. Smith, Andrew Warrington, and Scott Linderman. 2023. Simplified state space layers for sequence modeling. In Proceedings of the 11th International Conference on Learning Representations."},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/s43681-021-00043-6"},{"key":"e_1_3_1_44_2","unstructured":"Chloe Wang Oleksii Tsepa Jun Ma and Bo Wang. 2024. Graph-mamba: Towards long-range graph sequence modeling with selective state spaces. arXiv:2402.00789. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2402.00789"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_3_1_46_2","doi-asserted-by":"crossref","unstructured":"Renjie Wei Songqiang Xu Linfeng Zhong Zebin Yang Qingyu Guo Yuan Wang Runsheng Wang and Meng Li. 2025. LightMamba: Efficient Mamba acceleration on FPGA with quantization and hardware co-design. In Proceedings of the Design Automation & Test in Europe Conference (DATE 2025). 1\u20137.","DOI":"10.23919\/DATE64628.2025.10993079"},{"key":"e_1_3_1_47_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Wu Shuang","year":"2018","unstructured":"Shuang Wu, Guoqi Li, Feng Chen, and Luping Shi. 2018. Training and inference with integers in deep neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSII.2023.3241487"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC63097.2024.00027"},{"key":"e_1_3_1_50_2","unstructured":"Han Xiao Kashif Rasul and Roland Vollgraf. 2017. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:1708.07747. (2017)."},{"key":"e_1_3_1_51_2","unstructured":"Zukang Xu Yuxuan Yue Xing Hu Zhihang Yuan Zixu Jiang Zhixuan Chen Jiangyong Yu Chen Xu Sifan Zhou and Dawei Yang. 2025. MambaQuant: Quantizing the Mamba family with variance aligned rotation methods. In Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)."},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071027"},{"key":"e_1_3_1_53_2","unstructured":"Haoran Zhu Boyuan Chen and Carter Yang. 2023. Understanding why vit trains badly on small datasets: An intuitive perspective. arXiv:2302.03751. Retrieved from https:\/\/arxiv.org\/abs\/arXiv:2302.03751"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3762190","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,3]],"date-time":"2025-10-03T14:07:10Z","timestamp":1759500430000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3762190"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,26]]},"references-count":52,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3762190"],"URL":"https:\/\/doi.org\/10.1145\/3762190","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,9,26]]},"assertion":[{"value":"2025-08-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-09-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}