{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,31]],"date-time":"2025-07-31T00:31:59Z","timestamp":1753921919755,"version":"3.40.4"},"reference-count":65,"publisher":"Wiley","issue":"7","license":[{"start":{"date-parts":[[2014,10,9]],"date-time":"2014-10-09T00:00:00Z","timestamp":1412812800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"funder":[{"DOI":"10.13039\/501100000783","name":"Research Executive Agency","doi-asserted-by":"publisher","award":["286545"],"award-info":[{"award-number":["286545"]}],"id":[{"id":"10.13039\/501100000783","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Concurrency and Computation"],"published-print":{"date-parts":[[2016,5]]},"abstract":"<jats:title>Summary<\/jats:title><jats:p>The support vector machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in machine learning and has been successfully used in applications such as image classification, protein classification, and handwriting recognition. However, the computational complexity of the kernelized version of the algorithm grows quadratically with the number of training examples. To tackle this high computational complexity, we have developed a directive\u2010based approach that converts a gradient\u2010ascent based training algorithm for the CPU to an efficient graphics processing unit (GPU) implementation. We compare our GPU\u2010based SVM training algorithm to the standard LibSVM CPU implementation, a highly optimized GPU\u2010LibSVM implementation, as well as to a directive\u2010based OpenACC implementation. The results on different handwritten digit classification datasets demonstrate an important speed\u2010up for the current approach when compared to the CPU and OpenACC versions. Furthermore, our solution is almost as fast and sometimes even faster than the highly optimized CUBLAS\u2010based GPU\u2010LibSVM implementation, without sacrificing the algorithm's accuracy. Copyright \u00a9 2014 John Wiley &amp; Sons, Ltd.<\/jats:p>","DOI":"10.1002\/cpe.3413","type":"journal-article","created":{"date-parts":[[2014,10,10]],"date-time":"2014-10-10T00:57:38Z","timestamp":1412902658000},"page":"2274-2294","source":"Crossref","is-referenced-by-count":9,"title":["Evaluating automatically parallelized versions of the support vector machine"],"prefix":"10.1002","volume":"28","author":[{"given":"Valeriu","family":"Codreanu","sequence":"first","affiliation":[{"name":"Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen Groningen The Netherlands"},{"name":"Electronic Systems Group Eindhoven University of Technology Eindhoven The Netherlands"}]},{"given":"Bob","family":"Dr\u00f6ge","sequence":"additional","affiliation":[{"name":"Donald Smits Centrum voor Informatie Technologie University of Groningen Groningen The Netherlands"}]},{"given":"David","family":"Williams","sequence":"additional","affiliation":[{"name":"Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen Groningen The Netherlands"}]},{"given":"Burhan","family":"Yasar","sequence":"additional","affiliation":[{"name":"Rotasoft Inc. Ankara Turkey"}]},{"given":"Po","family":"Yang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology University of Bedfordshire Bedford UK"}]},{"given":"Baoquan","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology University of Bedfordshire Bedford UK"}]},{"given":"Feng","family":"Dong","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology University of Bedfordshire Bedford UK"}]},{"given":"Olarik","family":"Surinta","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Cognitive Engineering University of Groningen Groningen The Netherlands"}]},{"given":"Lambert R.B.","family":"Schomaker","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Cognitive Engineering University of Groningen Groningen The Netherlands"}]},{"given":"Jos B.T.M.","family":"Roerdink","sequence":"additional","affiliation":[{"name":"Johann Bernoulli Institute for Mathematics and Computer Science University of Groningen Groningen The Netherlands"}]},{"given":"Marco A.","family":"Wiering","sequence":"additional","affiliation":[{"name":"Institute of Artificial Intelligence and Cognitive Engineering University of Groningen Groningen The Netherlands"}]}],"member":"311","published-online":{"date-parts":[[2014,10,9]]},"reference":[{"issue":"7","key":"e_1_2_10_2_1","article-title":"Next generation data warehouse design with big data for big analytics and better insights","volume":"13","author":"Baboo S","year":"2013","journal-title":"Global Journal of Computer Science and Technology"},{"key":"e_1_2_10_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_2_10_4_1","unstructured":"MujaM LoweDG.FLANN 2009. fast library for approximate nearest neighbors."},{"key":"e_1_2_10_5_1","unstructured":"NissenS.Implementation of a Fast Artificial Neural Network library (FANN) Report Department of Computer Science University of Copenhagen (DIKU) 31 2003."},{"key":"e_1_2_10_6_1","unstructured":"GalloyM.CPU vs. GPU performance. (Available from:http:\/\/michaelgalloy.com\/2013\/06\/11\/cpu-vs-gpu-performance.html) [Accessed on 26 May 2014]."},{"volume-title":"Programming massively parallel processors: a hands\u2010on approach","year":"2010","author":"Kirk DB","key":"e_1_2_10_7_1"},{"key":"e_1_2_10_8_1","doi-asserted-by":"crossref","unstructured":"CavanaghJM PotokTE CuiX.Parallel latent semantic analysis using a graphics processing unit.Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers ACM Montreal Canada 2009;2505\u20132510.","DOI":"10.1145\/1570256.1570352"},{"key":"e_1_2_10_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"e_1_2_10_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_10_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2010.41"},{"key":"e_1_2_10_12_1","first-page":"355","article-title":"GPUMLib: an efficient open\u2010source GPU machine learning library","volume":"3","author":"Lopes N","year":"2011","journal-title":"International Journal of Computer Information Systems and Industrial Management Applications"},{"key":"e_1_2_10_13_1","doi-asserted-by":"crossref","first-page":"318","DOI":"10.7551\/mitpress\/5236.001.0001","volume-title":"Parallel Distributed Processing","author":"Rumelhart DE","year":"1986"},{"key":"e_1_2_10_14_1","first-page":"25","article-title":"Advanced forecasting methods for global crisis warning and models of intelligence","author":"Werbos PJ","year":"1977","journal-title":"General Systems"},{"key":"e_1_2_10_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.01.013"},{"key":"e_1_2_10_16_1","doi-asserted-by":"crossref","unstructured":"SteinkrausD BuckI SimardP.Using GPUs for machine learning algorithms.Proceedings. Eighth International Conference on Document Analysis and Recognition 2005 IEEE Seoul South Korea 2005;1115\u20131120.","DOI":"10.1109\/ICDAR.2005.251"},{"key":"e_1_2_10_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/11508069_45"},{"key":"e_1_2_10_18_1","unstructured":"ZhongwenL HongzhiL ZhengpingY XincaiW.Self\u2010organizing maps computing on graphic process unit 2005."},{"key":"e_1_2_10_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2005.1554979"},{"key":"e_1_2_10_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/11539902_134"},{"key":"e_1_2_10_21_1","unstructured":"ChellapillaK PuriS SimardP et al.High performance convolutional neural networks for document processing.Tenth International Workshop on Frontiers in Handwriting Recognition La Baule France 2006."},{"key":"e_1_2_10_22_1","doi-asserted-by":"crossref","unstructured":"BruntonA ShuC RothG.Belief propagation on the GPU for stereo vision.The 3rd Canadian Conference on Computer and Robot Vision 2006. IEEE Quebec Canada 2006;76\u201376.","DOI":"10.1109\/CRV.2006.19"},{"key":"e_1_2_10_23_1","first-page":"989","article-title":"Real\u2010time global stereo matching using hierarchical belief propagation","volume":"6","author":"Yang Q","year":"2006","journal-title":"BMVC"},{"key":"e_1_2_10_24_1","doi-asserted-by":"crossref","unstructured":"CatanzaroB SundaramN KeutzerK.Fast support vector machine training and classification on graphics processors.Proceedings of the 25th International Conference on Machine learning ACM Helsinki Finland 2008;104\u2013111.","DOI":"10.1145\/1390156.1390170"},{"key":"e_1_2_10_25_1","unstructured":"CarpenterA.cuSVM: a CUDA implementation of support vector classification and regression 2009. (Available from:patternsonascreen.net\/cuSVMDesc.pdf)."},{"key":"e_1_2_10_26_1","unstructured":"AthanasopoulosA DimouA MezarisV KompatsiarisI.GPU acceleration for support vector machines.Procs. 12th Inter. Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2011) Delft Netherlands 2011."},{"volume-title":"CUBLAS library","year":"2008","author":"Nvidia C","key":"e_1_2_10_27_1"},{"key":"e_1_2_10_28_1","unstructured":"Cire\u015fanDC MeierU GambardellaLM SchmidhuberJ.Handwritten digit recognition with a committee of deep neural nets on GPUs 2011. arXiv preprint arXiv:1103.4487."},{"key":"e_1_2_10_29_1","doi-asserted-by":"crossref","unstructured":"CodreanuV DongF LiuB RoerdinkJB WilliamsD YangP Yasar B.GPU\u2010ASIFT: a fast fully affine\u2010invariant feature extraction algorithm.Proceedings of the International Conference High Performance Computing and Simulation IEEE Helsinki Finland 2013;474\u2013481.","DOI":"10.1109\/HPCSim.2013.6641456"},{"key":"e_1_2_10_30_1","unstructured":"WuC.SiftGPU manual. (Available from:http:\/\/cs.unc.edu\/~ccwu) [Accessed on 10 December 2013]."},{"key":"e_1_2_10_31_1","unstructured":"KimC SatishN ChhuganiJ SaitoH KrishnaiyerR SmelyanskiyM GirkarM DubeyP.Closing the ninja performance gap through traditional programming and compiler technology.Technical Report Intel Labs 2011."},{"key":"e_1_2_10_32_1","unstructured":"RuppK.CPU GPU and MIC Hardware Characteristics Over Time. (Available from:http:\/\/www.karlrupp.net\/2013\/06\/cpu-gpu-and-mic-hardware-characteristics-over-time\/) [Accessed on 26 May 2014]."},{"key":"e_1_2_10_33_1","unstructured":"N. P. P. NVIDIA February2011. 11."},{"key":"e_1_2_10_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.214548"},{"key":"e_1_2_10_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/99.660313"},{"key":"e_1_2_10_36_1","doi-asserted-by":"crossref","unstructured":"WolfeM.Implementing the PGI accelerator model.Proceedings of the 3rd Workshop on General\u2010Purpose Computation on Graphics Processing Units ACM Pittsburgh PA 2010;43\u201350.","DOI":"10.1145\/1735688.1735697"},{"volume-title":"A Comparative Study of OpenACC Implementations","year":"2012","author":"Reyes R","key":"e_1_2_10_37_1"},{"key":"e_1_2_10_38_1","doi-asserted-by":"crossref","unstructured":"IrigoinF JouvelotP TrioletR.Semantical interprocedural parallelization: an overview of the PIPS project.Proceedings of the 5th International Conference on Supercomputing ACM Cologne 1991;244\u2013251.","DOI":"10.1145\/109025.109086"},{"key":"e_1_2_10_39_1","unstructured":"AminiM CreusilletB EvenS KeryellR GoubierO GueltonS McMahonJO PasquierF\u2010X P\u00e9anG VillalonP et al.Par4All: from convex array regions to heterogeneous computing.IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 Paris France 2012."},{"key":"e_1_2_10_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/193209.193217"},{"key":"e_1_2_10_41_1","doi-asserted-by":"crossref","unstructured":"MikushinD LikhogrudN ZhangEZ Bergstr\u00f6mC.KernelGen\u2014the design and implementation of a next generation compiler platform for accelerating numerical models on GPUs.Technical Report USI Technical Report Series in Informatics 2013.","DOI":"10.1109\/IPDPSW.2014.115"},{"key":"e_1_2_10_42_1","unstructured":"GrosserT ZhengH AloorR Simb\u00fcrgerA Gr\u00f6sslingerA PouchetL\u2010N.Polly\u2010polyhedral optimization in LLVM.Proceedings of the First International Workshop on Polyhedral Compilation Techniques (IMPACT) Vol.2011;2011"},{"key":"e_1_2_10_43_1","doi-asserted-by":"crossref","unstructured":"HanTD AbdelrahmanTS.hiCUDA: a high\u2010level directive\u2010based language for GPU programming.Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ACM Washington DC USA 2009;52\u201361.","DOI":"10.1145\/1513895.1513902"},{"key":"e_1_2_10_44_1","doi-asserted-by":"crossref","unstructured":"WilliamsD CodreanuV YangP LiuB DongF YasarB MahdianB ChiariniA ZhaoX RoerdinkJB.Evaluation of autoparallelization toolkits for commodity graphics hardware.Proceedings of the 10th International Conference on Parallel Processing and Applied Mathematics Warsaw Poland 2013;447\u2013457.","DOI":"10.1007\/978-3-642-55224-3_42"},{"key":"e_1_2_10_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-32820-6_85"},{"key":"e_1_2_10_46_1","unstructured":"MikalsenMA.OpenACC\u2010based snow simulation 2013."},{"key":"e_1_2_10_47_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11970-5_14"},{"key":"e_1_2_10_48_1","doi-asserted-by":"crossref","unstructured":"UnatD CaiX BadenSB.Mint: realizing CUDA performance in 3D stencil methods with annotated C.Proceedings of the international conference on Supercomputing ACM Tucson AZ USA 2011;214\u2013224.","DOI":"10.1145\/1995896.1995932"},{"volume-title":"C4.5 Programs for Machine Learning","year":"1993","author":"Quinlan J","key":"e_1_2_10_49_1"},{"key":"e_1_2_10_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(86)90072-X"},{"volume-title":"Pattern Classification and Scene Analysis","year":"1973","author":"Duda R","key":"e_1_2_10_51_1"},{"key":"e_1_2_10_52_1","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511801389"},{"volume-title":"Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond","year":"2002","author":"Sch\u00f6lkopf B","key":"e_1_2_10_53_1"},{"key":"e_1_2_10_54_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018628609742"},{"key":"e_1_2_10_55_1","unstructured":"PlattJ.Sequential minimal optimization: a fast algorithm for training support vector machines 1998."},{"key":"e_1_2_10_56_1","doi-asserted-by":"crossref","unstructured":"KennedyJ EberhartR.Particle swarm optimization.Proceedings of the IEEE International Conference on Neural Networks Vol.4 Perth Australia 1995;1942\u20131948.","DOI":"10.1109\/ICNN.1995.488968"},{"key":"e_1_2_10_57_1","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0040810"},{"key":"e_1_2_10_58_1","doi-asserted-by":"crossref","unstructured":"AmdahlGM.Validity of the single processor approach to achieving large scale computing capabilities.Proceedings of the April 18\u201320 1967 Spring Joint Computer Conference ACM Atlantic City NJ USA 1967;483\u2013485.","DOI":"10.1145\/1465482.1465560"},{"key":"e_1_2_10_59_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626400000214"},{"key":"e_1_2_10_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_2_10_61_1","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00052"},{"key":"e_1_2_10_62_1","doi-asserted-by":"crossref","unstructured":"MeierU CiresanD GambardellaL SchmidhuberJ.Better digit recognition with a committee of simple neural nets.2011 International Conference on Document Analysis and Recognition (ICDAR) Beijing China 2011;1250\u2013254.","DOI":"10.1109\/ICDAR.2011.252"},{"key":"e_1_2_10_63_1","doi-asserted-by":"crossref","unstructured":"CiresanDC MeierU SchmidhuberJ.Multi\u2010column deep neural networks for image classification.2012 IEEE Conference on Computer Vision and Pattern Recognition Providence RI USA 2012;3642\u20133649.","DOI":"10.1109\/CVPR.2012.6248110"},{"issue":"2","key":"e_1_2_10_64_1","article-title":"Handwritten Bangla basic and compound character recognition using MLP and SVM classifier","volume":"2","author":"Das N","year":"2010","journal-title":"Journal of Computing"},{"key":"e_1_2_10_65_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10032-009-0084-x"},{"key":"e_1_2_10_66_1","doi-asserted-by":"crossref","unstructured":"SurintaO SchomakerL WieringM.A comparison of feature and pixel\u2010based methods for recognizing handwritten Bangla digits.Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR) Washington DC USA 2013.","DOI":"10.1109\/ICDAR.2013.40"}],"container-title":["Concurrency and Computation: Practice and Experience"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1002%2Fcpe.3413","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.3413","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/cpe.3413","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/cpe.3413","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,5]],"date-time":"2025-05-05T02:56:48Z","timestamp":1746413808000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/cpe.3413"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,9]]},"references-count":65,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2016,5]]}},"alternative-id":["10.1002\/cpe.3413"],"URL":"https:\/\/doi.org\/10.1002\/cpe.3413","archive":["Portico"],"relation":{},"ISSN":["1532-0626","1532-0634"],"issn-type":[{"type":"print","value":"1532-0626"},{"type":"electronic","value":"1532-0634"}],"subject":[],"published":{"date-parts":[[2014,10,9]]}}}