{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:25:03Z","timestamp":1750220703063,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"5s","license":[{"start":{"date-parts":[[2019,10,7]],"date-time":"2019-10-07T00:00:00Z","timestamp":1570406400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2019,10,31]]},"abstract":"<jats:p>In the design of mobile systems, hardware\/software (HW\/SW) co-design has important advantages by creating specialized hardware for the performance or power optimizations. Dynamic binary translation (DBT) is a key component in co-design. During the translation, a dynamic optimizer in the DBT system applies various software optimizations to improve the quality of the translated code. With dynamic optimization, optimization time is an exposed run-time overhead and useful analyses are often restricted due to their high costs. Thus, a dynamic optimizer needs to make smart decisions with limited analysis information, which complicates the design of optimization decision models and often causes failures in human-made heuristics. In mobile systems, this problem is even more challenging because of strict constraints on computing capabilities and memory size.<\/jats:p>\n          <jats:p>\n            To overcome the challenge, we investigate an opportunity to build practical optimization decision models for DBT by using machine learning techniques. As the first step,\n            <jats:italic>loop unrolling<\/jats:italic>\n            is chosen as the representative optimization. We base our approach on the industrial strength DBT infrastructure and conduct evaluation with 17,116 unrollable loops collected from 200 benchmarks and real-life programs across various domains. By utilizing all available features that are potentially important for loop unrolling decision, we identify the best classification algorithm for our infrastructure with consideration for both prediction accuracy and cost. The greedy feature selection algorithm is then applied to the classification algorithm to distinguish its significant features and cut down the feature space. By maintaining significant features only, the best affordable classifier, which satisfies the budgets allocated to the decision process, shows 74.5% of prediction accuracy for the optimal unroll factor and realizes an average 20.9% reduction in dynamic instruction count during the steady-state translated code execution. For comparison, the best baseline heuristic achieves 46.0% prediction accuracy with an average 13.6% instruction count reduction. Given that the infrastructure is already highly optimized and the ideal upper bound for instruction reduction is observed at 23.8%, we believe this result is noteworthy.\n          <\/jats:p>","DOI":"10.1145\/3358185","type":"journal-article","created":{"date-parts":[[2019,10,10]],"date-time":"2019-10-10T13:13:05Z","timestamp":1570713185000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Multi-objective Exploration for Practical Optimization Decisions in Binary Translation"],"prefix":"10.1145","volume":"18","author":[{"given":"Sunghyun","family":"Park","sequence":"first","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan"}]},{"given":"Youfeng","family":"Wu","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}]},{"given":"Janghaeng","family":"Lee","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}]},{"given":"Amir","family":"Aupov","sequence":"additional","affiliation":[{"name":"Intel Corporation, Santa Clara, CA"}]},{"given":"Scott","family":"Mahlke","sequence":"additional","affiliation":[{"name":"University of Michigan, Ann Arbor, Michigan"}]}],"member":"320","published-online":{"date-parts":[[2019,10,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2019-02-08. Intel Core i7 Embedded Processor. https:\/\/ark.intel.com\/products\/series\/122593\/8th-Generation-Intel-Core-i7-Processors#@embedded.  2019-02-08. Intel Core i7 Embedded Processor. https:\/\/ark.intel.com\/products\/series\/122593\/8th-Generation-Intel-Core-i7-Processors#@embedded."},{"key":"e_1_2_1_2_1","unstructured":"2019-06-02. 3DMark. https:\/\/www.3dmark.com\/.  2019-06-02. 3DMark. https:\/\/www.3dmark.com\/."},{"key":"e_1_2_1_3_1","unstructured":"2019-06-02. FPMark. https:\/\/www.eembc.org\/fpmark\/.  2019-06-02. FPMark. https:\/\/www.eembc.org\/fpmark\/."},{"key":"e_1_2_1_4_1","unstructured":"2019-06-02. Geekbench. https:\/\/www.geekbench.com\/.  2019-06-02. Geekbench. https:\/\/www.geekbench.com\/."},{"key":"e_1_2_1_5_1","unstructured":"2019-06-02. SYSmark. https:\/\/bapco.com\/products\/sysmark-2018\/.  2019-06-02. SYSmark. https:\/\/bapco.com\/products\/sysmark-2018\/."},{"key":"e_1_2_1_6_1","unstructured":"2019-06-02. TabletMark. https:\/\/bapco.com\/products\/end-of-life-products\/tabletmark\/.  2019-06-02. TabletMark. https:\/\/bapco.com\/products\/end-of-life-products\/tabletmark\/."},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Felice Balarin Paolo Giusto Attila Jurecska Michael Chiodo Harry Hsieh Claudio Passerone Ellen Sentovich Luciano Lavagno Bassam Tabbara Alberto Sangiovanni-Vincentelli etal 1997. Hardware-software Co-design of Embedded Systems: The POLIS Approach. Springer Science 8 Business Media.  Felice Balarin Paolo Giusto Attila Jurecska Michael Chiodo Harry Hsieh Claudio Passerone Ellen Sentovich Luciano Lavagno Bassam Tabbara Alberto Sangiovanni-Vincentelli et al. 1997. Hardware-software Co-design of Embedded Systems: The POLIS Approach. Springer Science 8 Business Media.","DOI":"10.1007\/978-1-4615-6127-9"},{"key":"e_1_2_1_8_1","volume-title":"Dag Sverre Seljebotn, and Kurt Smith","author":"Behnel Stefan","year":"2011","unstructured":"Stefan Behnel , Robert Bradshaw , Craig Citro , Lisandro Dalcin , Dag Sverre Seljebotn, and Kurt Smith . 2011 . Cython : The best of both worlds. Computing in Science 8 Engineering 13, 2 (2011), 31--39. Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The best of both worlds. Computing in Science 8 Engineering 13, 2 (2011), 31--39."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772959"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3185768.3185771"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996864"},{"volume-title":"Supercomputing, 2005. Proceedings of the ACM\/IEEE SC 2005 Conference. IEEE, 14--14","author":"Cavazos John","key":"e_1_2_1_12_1","unstructured":"John Cavazos and Michael F. P . O\u2019Boyle. 2005. Automatic tuning of inlining heuristics . In Supercomputing, 2005. Proceedings of the ACM\/IEEE SC 2005 Conference. IEEE, 14--14 . John Cavazos and Michael F. P. O\u2019Boyle. 2005. Automatic tuning of inlining heuristics. In Supercomputing, 2005. Proceedings of the ACM\/IEEE SC 2005 Conference. IEEE, 14--14."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1167515.1167492"},{"key":"e_1_2_1_14_1","volume-title":"Davidson and Sanjay Jinturkar","author":"Jack","year":"1996","unstructured":"Jack W. Davidson and Sanjay Jinturkar . 1996 . Aggressive loop unrolling in a retargetable, optimizing compiler. In International Conference on Compiler Construction. Springer , 59--73. Jack W. Davidson and Sanjay Jinturkar. 1996. Aggressive loop unrolling in a retargetable, optimizing compiler. In International Conference on Compiler Construction. Springer, 59--73."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2003.1191529"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.931892"},{"volume-title":"Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94","author":"Govindarajan Ramaswamy","key":"e_1_2_1_17_1","unstructured":"Ramaswamy Govindarajan , Erik R. Altman , and Guang R. Gao . 1994. Minimizing register requirements under resource-constrained rate-optimal software pipelining . In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94 . Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture. ACM, 85--94."},{"key":"e_1_2_1_18_1","volume-title":"Patterson","author":"Hennessy John L.","year":"2011","unstructured":"John L. Hennessy and David A . Patterson . 2011 . Computer Architecture : A Quantitative Approach. Elsevier . John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach. Elsevier."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/1772954.1772965"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/381694.378831"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2009.21"},{"key":"e_1_2_1_23_1","first-page":"18","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw Andy","year":"2002","unstructured":"Andy Liaw , Matthew Wiener , 2002 . Classification and regression by randomForest . R News 2 , 3 (2002), 18 -- 22 . Andy Liaw, Matthew Wiener, et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18--22.","journal-title":"R News"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2018.00028"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.14778\/3231751.3231770"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.689643"},{"key":"e_1_2_1_27_1","first-page":"797","article-title":"Intelligent loop unrolling","volume":"5","author":"Mahadevan Uma","year":"1998","unstructured":"Uma Mahadevan and Lacky Shah . 1998 . Intelligent loop unrolling . US Patent 5 , 797 ,013. Uma Mahadevan and Lacky Shah. 1998. Intelligent loop unrolling. US Patent 5,797,013.","journal-title":"US Patent"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/646053.677574"},{"key":"e_1_2_1_29_1","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa , Ga\u00ebl Varoquaux , Alexandre Gramfort , Vincent Michel , Bertrand Thirion , Olivier Grisel , Mathieu Blondel , Peter Prettenhofer , Ron Weiss , Vincent Dubourg , 2011 . Scikit-learn: Machine learning in Python . Journal of Machine Learning Research 12 , Oct (2011), 2825 -- 2830 . Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, Oct (2011), 2825--2830.","journal-title":"Journal of Machine Learning Research 12"},{"key":"e_1_2_1_30_1","volume-title":"Timothy Sherwood, and Brad Calder.","author":"Perelman Erez","year":"2003","unstructured":"Erez Perelman , Greg Hamerly , Michael Van Biesbrouck , Timothy Sherwood, and Brad Calder. 2003 . Using SimPoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31 . ACM , 318--319. Erez Perelman, Greg Hamerly, Michael Van Biesbrouck, Timothy Sherwood, and Brad Calder. 2003. Using SimPoint for accurate and efficient simulation. In ACM SIGMETRICS Performance Evaluation Review, Vol. 31. ACM, 318--319."},{"key":"e_1_2_1_31_1","volume-title":"ACM SIGSOFT Software Engineering Notes","volume":"36","author":"Ravindar Archana","unstructured":"Archana Ravindar and Y. N. Srikant . 2011. Relative roles of instruction count and cycles per instruction in WCET estimation . In ACM SIGSOFT Software Engineering Notes , Vol. 36 . ACM, 55--60. Archana Ravindar and Y. N. Srikant. 2011. Relative roles of instruction count and cycles per instruction in WCET estimation. In ACM SIGSOFT Software Engineering Notes, Vol. 36. ACM, 55--60."},{"key":"e_1_2_1_32_1","volume-title":"Artificial Intelligence: A Modern Approach. Malaysia","author":"Russell Stuart J","year":"2016","unstructured":"Stuart J Russell and Peter Norvig . 2016 . Artificial Intelligence: A Modern Approach. Malaysia ; Pearson Education Limited . Stuart J Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/335231.335246"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2005.29"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781141"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451172"},{"key":"e_1_2_1_37_1","volume-title":"Machine learning in compiler optimization. Proc","author":"Wang Zheng","year":"2018","unstructured":"Zheng Wang and Michael O\u2019Boyle . 2018. Machine learning in compiler optimization. Proc . IEEE ( 2018 ). Zheng Wang and Michael O\u2019Boyle. 2018. Machine learning in compiler optimization. Proc. IEEE (2018)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1997.599625"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.293155"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/780822.781140"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1080\/095281300146272"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/5326.897072"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358185","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3358185","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:32:58Z","timestamp":1750199578000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,7]]},"references-count":42,"journal-issue":{"issue":"5s","published-print":{"date-parts":[[2019,10,31]]}},"alternative-id":["10.1145\/3358185"],"URL":"https:\/\/doi.org\/10.1145\/3358185","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2019,10,7]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}