{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,4]],"date-time":"2025-10-04T07:47:22Z","timestamp":1759564042730,"version":"3.41.2"},"reference-count":37,"publisher":"Wiley","issue":"14","license":[{"start":{"date-parts":[[2018,1,24]],"date-time":"2018-01-24T00:00:00Z","timestamp":1516752000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/100000054","name":"National Cancer Institute","doi-asserted-by":"publisher","award":["1U24CA180924-01A1"],"award-info":[{"award-number":["1U24CA180924-01A1"]}],"id":[{"id":"10.13039\/100000054","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"U.S. National Library of Medicine","doi-asserted-by":"publisher","award":["R01LM011119-01","R01LM009239"],"award-info":[{"award-number":["R01LM011119-01","R01LM009239"]}],"id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003593","name":"Conselho Nacional de Desenvolvimento Cient\u00edfico e Tecnol\u00f3gico","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100003593","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002322","name":"Coordena\u00e7\u00e3o de Aperfei\u00e7oamento de Pessoal de N\u00edvel Superior","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100002322","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["K25CA181503"],"award-info":[{"award-number":["K25CA181503"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2018,7,25]]},"abstract":"<jats:title>Summary<\/jats:title><jats:p>The Irregular Wavefront Propagation Pattern (IWPP) is a core computing structure in several image analysis operations. Efficient implementation of IWPP on the Intel Xeon Phi is difficult because of the irregular data access and computation characteristics. The traditional IWPP algorithm relies on atomic instructions, which are not available in the SIMD set of the Intel Phi. To overcome this limitation, we have proposed a new IWPP algorithm that can take advantage of non\u2010atomic SIMD instructions supported on the Intel Xeon Phi. We have also developed and evaluated methods to use CPU and Intel Phi cooperatively for parallel execution of the IWPP algorithms. Our new cooperative IWPP version is also able to handle large out\u2010of\u2010core images that would not fit into the memory of the accelerator. The new IWPP algorithm is used to implement the Morphological Reconstruction and Fill Holes operations, which are operations commonly found in image analysis applications. The vectorization implemented with the new IWPP has attained improvements of up to about 5\u00d7 on top of the original IWPP and significant gains as compared to state\u2010of\u2010the\u2010art the CPU and GPU versions. The new version running on an Intel Phi is 6.21\u00d7 and 3.14\u00d7 faster than running on a 16\u2010core CPU and on a GPU, respectively. Finally, the cooperative execution using two Intel Phi devices and a multi\u2010core CPU has reached performance gains of 2.14\u00d7 as compared to the execution using a single Intel Xeon Phi.<\/jats:p>","DOI":"10.1002\/cpe.4425","type":"journal-article","created":{"date-parts":[[2018,1,24]],"date-time":"2018-01-24T22:23:34Z","timestamp":1516832614000},"source":"Crossref","is-referenced-by-count":1,"title":["Cooperative and out\u2010of\u2010core execution of the irregular wavefront propagation pattern on hybrid machines with Intel<sup>\u00ae<\/sup> Xeon Phi\u2122"],"prefix":"10.1002","volume":"30","author":[{"given":"Jeremias","family":"Gomes","sequence":"first","affiliation":[{"name":"Department of Computer Science University of Bras\u00edlia  Bras\u00edlia\u2010DF Brazil"}]},{"given":"Alba C. M. A.","family":"de Melo","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Bras\u00edlia  Bras\u00edlia\u2010DF Brazil"}]},{"given":"Jun","family":"Kong","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics Emory University  Atlanta GA USA"}]},{"given":"Tahsin","family":"Kurc","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics Stony Brook University  Stony Brook NY USA"}]},{"given":"Joel H.","family":"Saltz","sequence":"additional","affiliation":[{"name":"Department of Biomedical Informatics Stony Brook University  Stony Brook NY USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6289-3914","authenticated-orcid":false,"given":"George","family":"Teodoro","sequence":"additional","affiliation":[{"name":"Department of Computer Science University of Bras\u00edlia  Bras\u00edlia\u2010DF Brazil"},{"name":"Department of Biomedical Informatics Stony Brook University  Stony Brook NY USA"}]}],"member":"311","published-online":{"date-parts":[[2018,1,24]]},"reference":[{"key":"e_1_2_10_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2013.03.001"},{"key":"e_1_2_10_3_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0081049"},{"key":"e_1_2_10_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/83.217222"},{"volume-title":"Morphological Image Analysis: Principles and Applications","year":"2003","author":"Pierre Soille","key":"e_1_2_10_5_1"},{"key":"e_1_2_10_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.87344"},{"key":"e_1_2_10_7_1","unstructured":"VincentL.Exact Euclidean distance function by chain propagations. Paper presented at: IEEE International Conference on Computer Vision and Pattern Recognition;1991;Maui HI."},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"TeodoroG KurcT KongJ CooperL SaltzJ.Comparative performance analysis of Intel (R) Xeon Phi (TM) GPU and CPU: a case study from microscopy image analysis. Paper presented at: IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS);2014;Phoenix AZ.","DOI":"10.1109\/IPDPS.2014.111"},{"key":"e_1_2_10_9_1","doi-asserted-by":"crossref","unstructured":"GomesJM TeodoroG deMeloA KongJ KurcT SaltzJH.Efficient irregular wavefront propagation algorithms on Intel (R) Xeon Phi (TM). Paper presented at: 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC\u2010PAD);2015;Florianopolis Brazil.","DOI":"10.1109\/SBAC-PAD.2015.13"},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBME.2010.2060338"},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.09.003"},{"issue":"4","key":"e_1_2_10_12_1","first-page":"291","article-title":"Quantification of histochemical staining by color deconvolution","volume":"23","author":"Ruifrok AC","year":"2001","journal-title":"Anal Quant Cytol Histol"},{"volume-title":"Intel Xeon Phi Coprocessor High\u2010Performance Programming","year":"2013","author":"Jeffers J","key":"e_1_2_10_13_1"},{"volume-title":"Digital Image Processing Using MATLAB","year":"2010","author":"Gonzalez RC","key":"e_1_2_10_14_1"},{"key":"e_1_2_10_15_1","doi-asserted-by":"crossref","unstructured":"NarayanasamyS WangZ TiganiJ EdwardsA CalderB.Automatically classifying benign and harmful data races using replay analysis. Paper presented at: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation;2007;San Diego CA.","DOI":"10.1145\/1250734.1250738"},{"key":"e_1_2_10_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2004.07.002"},{"volume-title":"STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library","year":"2001","author":"Musser DR","key":"e_1_2_10_17_1"},{"key":"e_1_2_10_18_1","unstructured":"SaltzJH KurcT CholletiS et al.Multi\u2010scale integrative study of brain tumor: In silico brain tumor research center. Paper presented at: Annual Symposium of American Medical Informatics Association 2010 Summit on Translational Bioinformatics (AMIA\u2010TBI);2010;Washington DC."},{"volume-title":"A Fast Parallel Implementation of Queue\u2010Based Morphological Reconstruction Using GPUs","year":"2012","author":"Teodoro G","key":"e_1_2_10_19_1"},{"key":"e_1_2_10_20_1","first-page":"19","article-title":"Memory bandwidth and machine balance in current high performance computers","author":"McCalpin JD","year":"1995","journal-title":"IEEE Computer Society Technical Committee on Computer Architecture Newsletter"},{"key":"e_1_2_10_21_1","doi-asserted-by":"crossref","unstructured":"HeX AgarwalD PrasadSK.Design and implementation of a parallel priority queue on many\u2010core architectures. Paper presented at: 19th International Conference on High Performance Computing (HiPC);2012;Pune India.","DOI":"10.1109\/HiPC.2012.6507490"},{"key":"e_1_2_10_22_1","doi-asserted-by":"crossref","unstructured":"NewburnCJ DmitrievS NarayanaswamyR et al.Offload compiler runtime for the Intel (R) Xeon Phi coprocessor. Paper presented at: IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW);2013;Cambridge MA.","DOI":"10.1109\/IPDPSW.2013.251"},{"key":"e_1_2_10_23_1","doi-asserted-by":"crossref","unstructured":"HongS KimSK OguntebiT OlukotunK.Accelerating CUDA graph algorithms at maximum warp. Paper presented at: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming;2011;San Antonio TX.","DOI":"10.1145\/1941553.1941590"},{"key":"e_1_2_10_24_1","doi-asserted-by":"crossref","unstructured":"TaoG YutongL GuangS.Using MIC to accelerate a typical data\u2010intensive application: the breadth\u2010first search. Paper presented at: 2013 IEEE 27th International on Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW);2013;Cambridge MA.","DOI":"10.1109\/IPDPSW.2013.197"},{"volume-title":"Mathematical Morphology in Image Processing","year":"1992","author":"Vincent L","key":"e_1_2_10_25_1"},{"key":"e_1_2_10_26_1","doi-asserted-by":"crossref","unstructured":"LaurentC RomanJ.Parallel implementation of morphological connected operators based on irregular data structures. Paper presented at: Third International Conference on Vector and Parallel Processing VECPAR '98;1999;London UK.","DOI":"10.1007\/10703040_44"},{"key":"e_1_2_10_27_1","series-title":"OpenAccess Series in Informatics (OASIcs)","first-page":"54","volume-title":"Sixth Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS'10)\u00a0\u2013\u00a0Selected Papers","author":"Karas P","year":"2011"},{"issue":"8","key":"e_1_2_10_28_1","first-page":"822","article-title":"Image contrast enhancement using morphological decomposition by reconstruction","volume":"7","author":"Jivet I","year":"2008","journal-title":"WSEAS Trans Cir Sys"},{"key":"e_1_2_10_29_1","doi-asserted-by":"crossref","unstructured":"Anacona\u2010MosqueraO VinhalG SampaioRC TeodoroG JacobiRP LlanosCH.Efficient hardware implementation of morphological reconstruction based on sequential reconstruction algorithm. Paper presented at: 30th Symposium on Integrated Circuits and Systems Design (SBCCI);2017;Fortaleza Brazil.","DOI":"10.1145\/3109984.3110020"},{"key":"e_1_2_10_30_1","doi-asserted-by":"crossref","unstructured":"LukCK HongS KimH.Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. Paper presented at: 42nd International Symposium on Microarchitecture (MICRO);2009;New York NY.","DOI":"10.1145\/1669112.1669121"},{"key":"e_1_2_10_31_1","doi-asserted-by":"crossref","unstructured":"AugonnetC ThibaultS NamystR WacrenierPA.StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Paper presented at: 15th International Euro\u2010Par Conference on Parallel Processing;2009;Delft The Netherlands.","DOI":"10.1007\/978-3-642-03869-3_80"},{"key":"e_1_2_10_32_1","doi-asserted-by":"crossref","unstructured":"RaviVT MaW ChiuD AgrawalG.Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. Paper presented at: Proceedings of the 24th ACM International Conference on Supercomputing;2010;Tsukuba Japan.","DOI":"10.1145\/1810085.1810106"},{"key":"e_1_2_10_33_1","doi-asserted-by":"crossref","unstructured":"HuoX RaviVT AgrawalG.Porting irregular reductions on heterogeneous CPU\u2010GPU configurations. Paper presented at: 18th International Conference on High Performance Computing (HiPC);2011;Bangalore India.","DOI":"10.1109\/HiPC.2011.6152715"},{"key":"e_1_2_10_34_1","doi-asserted-by":"crossref","unstructured":"BuenoJ PlanasJ DuranA et al.Productive programming of GPU clusters with OmpSs. Paper presented at: IEEE 26th International Parallel Distributed Processing Symposium (IPDPS);2012;Shanghai China.","DOI":"10.1109\/IPDPS.2012.58"},{"key":"e_1_2_10_35_1","doi-asserted-by":"crossref","unstructured":"RossbachCJ CurreyJ SilbersteinM RayB WitchelE.PTask: Operating system abstractions to manage GPUs as compute devices. Paper presented at: Proceedings of the Twenty\u2010Third ACM Symposium on Operating Systems Principles SOSP '11;2011;Cascais Portugal.","DOI":"10.1145\/2043556.2043579"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"GautierT LimaJVF MaillardN RaffinB.Xkaapi: A runtime system for data\u2010flow task programming on heterogeneous architectures. Paper presented at: 2013 IEEE International Symposium on Parallel and Distributed Processing;2013;Boston MA.","DOI":"10.1109\/IPDPS.2013.66"},{"key":"e_1_2_10_37_1","doi-asserted-by":"crossref","unstructured":"HolewinskiJ PouchetLN SadayappanP.High\u2010performance code generation for stencil computations on GPU architectures. Paper presented at: Proceedings of the 26th ACM International Conference on Supercomputing ICS '12;2012;Venice Italy.","DOI":"10.1145\/2304576.2304619"},{"key":"e_1_2_10_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33518-1_40"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.4425","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.4425","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,23]],"date-time":"2023-09-23T23:11:30Z","timestamp":1695510690000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.4425"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,1,24]]},"references-count":37,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2018,7,25]]}},"alternative-id":["10.1002\/cpe.4425"],"URL":"https:\/\/doi.org\/10.1002\/cpe.4425","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2018,1,24]]},"article-number":"e4425"}}