{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,30]],"date-time":"2025-10-30T07:02:14Z","timestamp":1761807734700,"version":"3.40.2"},"reference-count":45,"publisher":"Wiley","issue":"18","license":[{"start":{"date-parts":[[2012,2,23]],"date-time":"2012-02-23T00:00:00Z","timestamp":1329955200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2012,12,25]]},"abstract":"<jats:title>SUMMARY<\/jats:title><jats:p>Loops are the richest source of parallelism in scientific applications. A large number of loop scheduling schemes have therefore been devised for loops with and without data dependencies (modeled as dependence distance vectors) on heterogeneous clusters. The loops with data dependencies require synchronization via cross\u2010node communication. Synchronization requires fine\u2010tuning to overcome the communication overhead and to yield the best possible overall performance. In this paper, a theoretical model is presented to determine the granularity of synchronization that minimizes the parallel execution time of loops with data dependencies when these are parallelized on heterogeneous systems using dynamic self\u2010scheduling algorithms. New formulas are proposed for estimating the total number of scheduling steps when a threshold for the minimum work assigned to a processor is assumed. The proposed model uses these formulas to determine the synchronization granularity that minimizes the estimated parallel execution time. The accuracy of the proposed model is verified and validated via extensive experiments on a heterogeneous computing system. The results show that the theoretically optimal synchronization granularity, as determined by the proposed model, is very close to the experimentally observed optimal synchronization granularity, with no deviation in the best case, and within 38.4% in the worst case. Copyright \u00a9 2012 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.2812","type":"journal-article","created":{"date-parts":[[2012,2,24]],"date-time":"2012-02-24T02:26:42Z","timestamp":1330050402000},"page":"2302-2327","source":"Crossref","is-referenced-by-count":7,"title":["Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems"],"prefix":"10.1002","volume":"24","author":[{"given":"I.","family":"Riakiotakis","sequence":"first","affiliation":[{"name":"School of Electrical and Computer Engineering National Technical University of Athens 9, Heroon Polytechnioy, Zografou 15773 Athens Greece"}]},{"given":"F. M.","family":"Ciorba","sequence":"additional","affiliation":[{"name":"Center for Information Services and High Performance Computing Technische Universit\u00e4t Dresden Zellescher Weg 12\/14 Dresden 01062 Germany"}]},{"given":"T.","family":"Andronikos","sequence":"additional","affiliation":[{"name":"Department of Informatics Ionian University 7, Tsirigoti Square 49100 Corfu Greece"}]},{"given":"G.","family":"Papakonstantinou","sequence":"additional","affiliation":[{"name":"School of Electrical and Computer Engineering National Technical University of Athens 9, Heroon Polytechnioy, Zografou 15773 Athens Greece"}]},{"given":"A. T.","family":"Chronopoulos","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Texas at San Antonio TX 78249 USA"}]}],"member":"311","published-online":{"date-parts":[[2012,2,23]]},"reference":[{"key":"e_1_2_8_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSE.1985.231547"},{"key":"e_1_2_8_3_1","unstructured":"TangP YewPC.Processor self\u2010scheduling for multiple\u2010nested parallel loops.Proceedings of the International Conference on Parallel Processing St. Charles IL USA 1986;528\u2013535."},{"key":"e_1_2_8_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1987.5009495"},{"key":"e_1_2_8_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/135226.135232"},{"key":"e_1_2_8_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.205655"},{"key":"e_1_2_8_7_1","doi-asserted-by":"crossref","unstructured":"HummelSF SchmidtJ UmaRN WeinJ.Load\u2010sharing in heterogeneous systems via weighted factoring.Proceedings of 8th Annual ACM Symposium on Parallel Algorithms and Architectures ACM New York NY USA 1996.","DOI":"10.1145\/237502.237576"},{"key":"e_1_2_8_8_1","unstructured":"BanicescuI LiuZ.Adaptive factoring: a dynamic scheduling method tuned to the rate of weight changes.Proceedings of the High Performance Computing Symposium 2000 Washington USA 2000;122\u2013129."},{"key":"e_1_2_8_9_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1023588520138"},{"key":"e_1_2_8_10_1","doi-asserted-by":"crossref","unstructured":"HerreraJ HuedoE MonteroRS LlorenteIM.Loosely\u2010coupled loop scheduling in computational grids.Proceedings of the 20th IEEE International Parallel and Distributed Processing Symposium 3rd High\u2010Performance Grid Computing Workshop (HPGC 2006) Rhodes Island Greece 25\u201329 April2006.","DOI":"10.1109\/IPDPS.2006.1639657"},{"key":"e_1_2_8_11_1","doi-asserted-by":"crossref","unstructured":"PenmatsaS ChronopoulosAT KaronisNT ToonenB.Implementation of distributed loop scheduling schemes on the TeraGrid.Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007) 4th High\u2010Performance Grid Computing Workshop Long Beach California USA March2007;1\u20138.","DOI":"10.1109\/IPDPS.2007.370551"},{"key":"e_1_2_8_12_1","doi-asserted-by":"publisher","DOI":"10.1587\/transfun.E92.A.1764"},{"key":"e_1_2_8_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2008.12.003"},{"key":"e_1_2_8_14_1","doi-asserted-by":"crossref","unstructured":"ShihW\u2010C TsengS\u2010S YangC\u2010T.Performance study of parallel programming on cloud computing environments using MapReduce.Information Science and Applications (ICISA) Seul Korea 2010;1\u20138.","DOI":"10.1109\/ICISA.2010.5480515"},{"key":"e_1_2_8_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1627"},{"key":"e_1_2_8_16_1","doi-asserted-by":"publisher","DOI":"10.4304\/jcp.6.7.1339-1345"},{"key":"e_1_2_8_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-010-0418-y"},{"key":"e_1_2_8_18_1","doi-asserted-by":"crossref","unstructured":"CiorbaFM AndronikosT RiakiotakisI ChronopoulosAT PapakonstantinouG.Dynamic multi\u2010phase scheduling for heterogeneous clusters.Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2006) Rhodes Greece 2006.","DOI":"10.1109\/IPDPS.2006.1639308"},{"key":"e_1_2_8_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2007.07.003"},{"volume-title":"Optimizing Compilers for Modern Architectures \u2013 A Dependence\u2010based Approach","year":"2001","author":"Allen R","key":"e_1_2_8_20_1"},{"key":"e_1_2_8_21_1","unstructured":"YangC\u2010T ChengK\u2010W LiK\u2010C.An efficient load balancing scheme for grid\u2010based high performance scientific computing.Proceedings of the 19th International Conference on Advanced Information Networking and Applications (AINA'05) Tamkang University Taiwan 2005."},{"key":"e_1_2_8_22_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.960"},{"key":"e_1_2_8_23_1","doi-asserted-by":"crossref","unstructured":"KejariwalA NicolauA PolychronopoulosCD.History\u2010aware self\u2010scheduling.Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006) Columbus Ohio USA 2006;185\u2013192.","DOI":"10.1109\/ICPP.2006.49"},{"key":"e_1_2_8_24_1","doi-asserted-by":"crossref","unstructured":"RiakiotakisI CiorbaFM AndronikosT PapakonstantinouG.Self\u2010adapting scheduling for tasks with data dependencies in stochastic environments.Proceedings of the 5th International Workshop on Algorithms Models and Tools for Parallel Computing on Heterogeneous Networks (HeteroPar 06) of the CLUSTER 2006 Barcelona Spain 2006.","DOI":"10.1109\/CLUSTR.2006.311912"},{"key":"e_1_2_8_25_1","first-page":"165","article-title":"Optimal grain size computation for pipelined algorithms","volume":"1","author":"Desprez F","year":"1996","journal-title":"Proceeding Euro\u2010Par '96 Proceedings of the Second International Euro\u2010Par Conference on Parallel Processing"},{"key":"e_1_2_8_26_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1997.1371"},{"key":"e_1_2_8_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-4337-4"},{"key":"e_1_2_8_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8191(02)00098-4"},{"key":"e_1_2_8_29_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041294"},{"key":"e_1_2_8_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.406958"},{"key":"e_1_2_8_31_1","unstructured":"HuangT\u2010C HsuP\u2010H ShengT\u2010N.Efficient runtime scheduling for parallelizing partially parallel loop.Proceedings of the 3rd International Conference on Algorithms and Architectures for Parallel Processing Melbourne Victoria Australia 1997;397\u2013403."},{"key":"e_1_2_8_32_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007577115980"},{"key":"e_1_2_8_33_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626499000207"},{"key":"e_1_2_8_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0140-3664(99)00073-0"},{"key":"e_1_2_8_35_1","doi-asserted-by":"crossref","unstructured":"CiorbaFM RiakiotakisI AndronikosT ChronopoulosAT PapakonstantinouG.Optimal synchronization frequency for dynamic pipelined computations on heterogeneous systems (poster).Proceedings of IEEE International Conference on Cluster Computing (CLUSTER 2007) Austin TX USA 2007.","DOI":"10.1109\/CLUSTR.2007.4629257"},{"key":"e_1_2_8_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.peva.2010.08.020"},{"key":"e_1_2_8_37_1","unstructured":"MPI Performance Topics. Available from:https:\/\/computing.llnl.gov\/tutorials\/mpi_performance\/\\#MessageSize Mpptest:http:\/\/www.mcs.anl.gov\/research\/projects\/mpi\/mpptest\/ and Perftest package:http:\/\/wilbur.mcs.anl.gov\/pub\/mpi\/tools\/."},{"key":"e_1_2_8_38_1","doi-asserted-by":"crossref","unstructured":"CierniakM LiW ZakiMJ.Loop scheduling for heterogeneity.Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing Washington DC 1995;78\u201385.","DOI":"10.1109\/HPDC.1995.518697"},{"key":"e_1_2_8_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/32.83908"},{"key":"e_1_2_8_40_1","unstructured":"YueKK LijlaDJ.Parallel loop scheduling for high\u2010performance computers.Technical Report No. HPPC\u201094\u201012 Dept. of Computer Science Univ. of Minnesota 1994."},{"key":"e_1_2_8_41_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJCSE.2005.009696"},{"key":"e_1_2_8_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/183432.183440"},{"volume-title":"Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers","year":"2005","author":"Wilkinson B","key":"e_1_2_8_43_1"},{"key":"e_1_2_8_44_1","unstructured":"Mathcad. Available from:http:\/\/www.mathcad.com."},{"key":"e_1_2_8_45_1","first-page":"75","article-title":"An adaptive algorithm for spatial grey scale","volume":"17","author":"Floyd RW","year":"1976","journal-title":"Proceedings of the Society for Information Display"},{"key":"e_1_2_8_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(70)90057-4"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.2812","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.2812","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,21]],"date-time":"2025-03-21T17:13:00Z","timestamp":1742577180000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.2812"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,2,23]]},"references-count":45,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2012,12,25]]}},"alternative-id":["10.1002\/cpe.2812"],"URL":"https:\/\/doi.org\/10.1002\/cpe.2812","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2012,2,23]]}}}