{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T16:03:04Z","timestamp":1774022584600,"version":"3.50.1"},"reference-count":37,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,9,30]],"date-time":"2018-09-30T00:00:00Z","timestamp":1538265600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Grenoble Alpes M\u00e9tropole through the Nano2017 Esprit project"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2018,9,30]]},"abstract":"<jats:p>Although performing inference with artificial neural networks (ANN) was until quite recently considered as essentially compute intensive, the emergence of deep neural networks coupled with the evolution of the integration technology transformed inference into a memory bound problem. This ascertainment being established, many works have lately focused on minimizing memory accesses, either by enforcing and exploiting sparsity on weights or by using few bits for representing activations and weights, to be able to use ANNs inference in embedded devices. In this work, we detail an architecture dedicated to inference using ternary {\u22121, 0, 1} weights and activations. This architecture is configurable at design time to provide throughput vs. power trade-offs to choose from. It is also generic in the sense that it uses information drawn for the target technologies (memory geometries and cost, number of available cuts, etc.) to adapt at best to the FPGA resources. This allows to achieve up to 5.2k frames per second per Watt for classification on a VC709 board using approximately half of the resources of the FPGA.<\/jats:p>","DOI":"10.1145\/3270764","type":"journal-article","created":{"date-parts":[[2018,12,12]],"date-time":"2018-12-12T12:49:32Z","timestamp":1544618972000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":15,"title":["High-Efficiency Convolutional Ternary Neural Networks with Custom Adder Trees and Weight Compression"],"prefix":"10.1145","volume":"11","author":[{"given":"Adrien","family":"Prost-Boucle","sequence":"first","affiliation":[{"name":"Univ. Grenoble Alpes, CNRS, Grenoble INP*, TIMA, Grenoble, France"}]},{"given":"Alban","family":"Bourge","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CNRS, Grenoble INP*, TIMA, Grenoble, France"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0624-7373","authenticated-orcid":false,"given":"Fr\u00e9d\u00e9ric","family":"P\u00e9trot","sequence":"additional","affiliation":[{"name":"Univ. Grenoble Alpes, CNRS, Grenoble INP*, TIMA, Grenoble, France"}]}],"member":"320","published-online":{"date-parts":[[2018,12,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2017.7966166"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2017.2682138"},{"key":"e_1_2_1_3_1","volume-title":"Andrew Davidson","author":"Batcher Ken","year":"1995","unstructured":"Ken Batcher . 1987. Quoted in \u201cHumour the computer \u201d, Andrew Davidson , 1995 , MIT Press , p.-40. Ken Batcher. 1987. Quoted in \u201cHumour the computer\u201d, Andrew Davidson, 1995, MIT Press, p.-40."},{"key":"e_1_2_1_4_1","volume-title":"Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems","author":"Courbariaux Matthieu","year":"2015","unstructured":"Matthieu Courbariaux , Yoshua Bengio , and Jean-Pierre David . 2015 . Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems . MIT Press , 3123--3131. Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 3123--3131."},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the IEEE International Solid-State Circuits Conference. IEEE, 238--239","author":"Desoli Giuseppe","year":"2017","unstructured":"Giuseppe Desoli , Nitin Chawla , Thomas Boesch , Surinder-pal Singh, Elio Guidetti , Fabio De Ambroggi , Tommaso Majo , Paolo Zambotti , Manuj Ayodhyawasi , Harvinder Singh , and Nalin Aggarwal . 2017 . 14.1A 2.9TOPS\/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems . In Proceedings of the IEEE International Solid-State Circuits Conference. IEEE, 238--239 . Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1A 2.9TOPS\/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Proceedings of the IEEE International Solid-State Circuits Conference. IEEE, 238--239."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3029580.3029586"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"key":"e_1_2_1_8_1","volume-title":"Kwok","author":"Hou Lu","year":"2016","unstructured":"Lu Hou , Quanming Yao , and James T . Kwok . 2016 . Loss-aware binarization of deep networks. arXiv:1611.01600. Lu Hou, Quanming Yao, and James T. Kwok. 2016. Loss-aware binarization of deep networks. arXiv:1611.01600."},{"key":"e_1_2_1_9_1","unstructured":"Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061.  Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv and Yoshua Bengio. 2016. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv:1609.07061."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/SiPS.2014.6986082"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815631"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/3130379.3130723"},{"key":"e_1_2_1_13_1","volume-title":"Seminumerical algorithms","author":"Knuth Donald E.","unstructured":"Donald E. Knuth . 1997. Seminumerical algorithms , vol. 2 . In The Art of Computer Programming. Addison-Wesley , Reading. Donald E. Knuth. 1997. Seminumerical algorithms, vol. 2. In The Art of Computer Programming. Addison-Wesley, Reading."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2014.6927468"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_17_1","unstructured":"Fengfu Li Bo Zhang and Bin Liu. 2016. Ternary weight networks. arXiv:1605.04711.  Fengfu Li Bo Zhang and Bin Liu. 2016. Ternary weight networks. arXiv:1605.04711."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021786"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079758"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 27th International Conference on Field Programmable Logic and Applications.","author":"Moss Duncan J. M.","unstructured":"Duncan J. M. Moss , Eriko Nurvitadhi , Jaewoong Sim , Asit Mishra , Debbie Marr , Suchit Subhaschandra , and Philip H. W. Leong . 2017. High performance binary neural networks on the Xeon+FPGA platform . In Proceedings of the 27th International Conference on Field Programmable Logic and Applications. Duncan J. M. Moss, Eriko Nurvitadhi, Jaewoong Sim, Asit Mishra, Debbie Marr, Suchit Subhaschandra, and Philip H. W. Leong. 2017. High performance binary neural networks on the Xeon+FPGA platform. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications."},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 27th International Conference on Field Programmable Logic and Applications.","author":"Nakahara Hiroki","year":"2017","unstructured":"Hiroki Nakahara , Tomoya Fujii , and Shimpei Sato . 2017 . A fully connected layer elimination for a binarized convolutional neural network on an FPGA . In Proceedings of the 27th International Conference on Field Programmable Logic and Applications. Hiroki Nakahara, Tomoya Fujii, and Shimpei Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning.","author":"Netzer Yuval","unstructured":"Yuval Netzer , Tao Wang , Adam Coates , Alessandro Bissacco , Bo Wu , and Andrew Y. Ng . 2011. Reading digits in natural images with unsupervised feature learning . In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. 2011. Reading digits in natural images with unsupervised feature learning. In Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021740"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7471828"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/MDAT.2016.2573586"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056850"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"e_1_2_1_28_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.  Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556."},{"key":"e_1_2_1_29_1","volume-title":"Proceedings of the International Joint Conference on Neural Networks.","author":"Stallkamp Johannes","year":"2011","unstructured":"Johannes Stallkamp , Marc Schlipsing , Jan Salmen , and Christian Igel . 2011 . Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition . In Proceedings of the International Joint Conference on Neural Networks. Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. 2011. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. In Proceedings of the International Joint Conference on Neural Networks."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alex Alemi. 2016. Inception-v4 inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261.  Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alex Alemi. 2016. Inception-v4 inception-ResNet and the impact of residual connections on learning. arXiv:1602.07261.","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1815961.1816008"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 17th International Forum on MPSoC.","author":"Vissers K.","year":"2017","unstructured":"K. Vissers . 2017 . A framework for reduced precision neural networks on FPGA . In Proceedings of the 17th International Forum on MPSoC. Retrieved from http:\/\/www.mpsoc-forum.org\/previous\/2017\/files\/proceedings\/Kees_Vissers.pdf. K. Vissers. 2017. A framework for reduced precision neural networks on FPGA. In Proceedings of the 17th International Forum on MPSoC. Retrieved from http:\/\/www.mpsoc-forum.org\/previous\/2017\/files\/proceedings\/Kees_Vissers.pdf."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.23919\/FPL.2017.8056794"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021741"},{"key":"e_1_2_1_37_1","volume-title":"Dally","author":"Zhu Chenzhuo","year":"2017","unstructured":"Chenzhuo Zhu , Song Han , Huizi Mao , and William J . Dally . 2017 . Trained ternary quantization. arXiv:1612.01064v3. Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. arXiv:1612.01064v3."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO\u201911)","author":"\u0160koda Peter","year":"2011","unstructured":"Peter \u0160koda , Tomislav Lipi\u0107 , \u00c0goston Srp , Branka Medved Rogina , Karolj Skala , and Ferenc Vajda . 2011 . Implementation framework for artificial neural networks on FPGA . In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO\u201911) . 274--278. Peter \u0160koda, Tomislav Lipi\u0107, \u00c0goston Srp, Branka Medved Rogina, Karolj Skala, and Ferenc Vajda. 2011. Implementation framework for artificial neural networks on FPGA. In Proceedings of the 34th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO\u201911). 274--278."}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3270764","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3270764","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:13:10Z","timestamp":1750212790000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3270764"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,30]]},"references-count":37,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,9,30]]}},"alternative-id":["10.1145\/3270764"],"URL":"https:\/\/doi.org\/10.1145\/3270764","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"value":"1936-7406","type":"print"},{"value":"1936-7414","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,9,30]]},"assertion":[{"value":"2017-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}