{"id":14263,"date":"2018-06-18T19:15:55","date_gmt":"2018-06-18T23:15:55","guid":{"rendered":"http:\/\/pzd.hmy.temporary.site\/?p=14263"},"modified":"2018-06-18T19:15:55","modified_gmt":"2018-06-18T23:15:55","slug":"r-vs-python-image-classification-with-keras","status":"publish","type":"post","link":"https:\/\/datascienceplus.com\/r-vs-python-image-classification-with-keras\/","title":{"rendered":"R vs Python: Image Classification with Keras"},"content":{"rendered":"<p>Even though the libraries for <a href=\"https:\/\/rpy2.bitbucket.io\/\" target=\"_blank\">R from Python<\/a>, or <a href=\"https:\/\/rstudio.github.io\/reticulate\/\" target=\"_blank\">Python from R<\/a> code execution existed since years and despite of a recent <a href=\"http:\/\/wesmckinney.com\/blog\/announcing-ursalabs\/\" target=\"_blank\">announcement<\/a> of <a href=\"https:\/\/ursalabs.org\/\" target=\"_blank\">Ursa Labs<\/a> foundation by <a href=\"http:\/\/wesmckinney.com\/\" target=\"_blank\">Wes McKinney<\/a> who is aiming to join forces with RStudio foundation, <a href=\"http:\/\/hadley.nz\/\" target=\"_blank\">Hadley Wickham<\/a> in particularly, (find more <a href=\"https:\/\/qz.com\/1270139\/r-and-python-are-joining-forces-in-the-most-ambitious-crossover-event-of-the-year-for-programmers\/\" target=\"_blank\">here<\/a>) to improve data scientists workflow and unify libraries to be used not only in Python, but in any programming language used by data scientists, some data professionals are still very strict on the language to be used for ANN models limiting their dev. environment exclusively to Python.  <\/p>\n<p>As a continuation of my <a href=\"https:\/\/www.dkisler.de\/2018\/02\/19\/r-versus-python-loops-required-time\/\" target=\"_blank\">R vs. Python comparison<\/a>, I decided to test performance of both languages in terms of time required to train a <a href=\"http:\/\/cs231n.github.io\/convolutional-networks\/\" target=\"_blank\">convolutional neural network<\/a> based model for image recognition. As the starting point, I took <a href=\"https:\/\/shirinsplayground.netlify.com\/2018\/06\/keras_fruits\/\" target=\"_blank\">the blog post<\/a> by <a href=\"https:\/\/www.linkedin.com\/in\/shirin-glander-01120881\/\" target=\"_blank\">Dr. Shirin Glander<\/a> on how easy it is to build a CNN model in R using Keras.<\/p>\n<p>A few words about <a href=\"https:\/\/keras.io\/\" target=\"_blank\">Keras<\/a>. It is a Python library for artificial neural network ML models which provides high level fronted to various deep learning frameworks with <a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\">Tensorflow<\/a> being the default one.<br \/>\nKeras has many pros with the following among the others:<\/p>\n<ul>\n<li>Easy to build complex models just in few lines of code =&gt; perfect for dev. cycle to quickly experiment and check your ideas<\/li>\n<li>Code recycling: one can easily swap the backend framework (let&#8217;s say from <a href=\"https:\/\/docs.microsoft.com\/en-ca\/cognitive-toolkit\/\" target=\"_blank\">CNTK<\/a> to <a href=\"https:\/\/www.tensorflow.org\/\" target=\"_blank\">Tensorflow<\/a> or vice versa) =&gt; <a href=\"https:\/\/en.wikipedia.org\/wiki\/Don%27t_repeat_yourself\" target=\"_blank\">DRY principle<\/a><\/li>\n<li>Seamless use of GPU =&gt; perfect for fast model tuning and experimenting<\/li>\n<\/ul>\n<p>Since Keras is written in Python, it may be a natural choice for your dev. environment to use Python. And that was the case until about a year ago when <a href=\"https:\/\/www.rstudio.com\/\" target=\"_blank\">RStudio<\/a> founder <a href=\"https:\/\/en.wikipedia.org\/wiki\/Joseph_J._Allaire\" target=\"_blank\">J.J.Allaire<\/a> <a href=\"https:\/\/blog.rstudio.com\/2017\/09\/05\/keras-for-r\/\" target=\"_blank\">announced<\/a> release of <a href=\"https:\/\/keras.rstudio.com\/\" target=\"_blank\">the Keras library for R<\/a> in May&#8217;17. I consider this to be a turning point for data scientists; now we can be more flexible with dev. environment and be able to deliver result more efficiently with opportunity to extend existing solutions we may have written in R.<\/p>\n<p>It brings me to the point of this post.<br \/>\n<strong>My hypothesis is<\/strong>, when it comes to ANN ML model building with Keras, Python is not a must, and depending on your client&#8217;s request, or tech stack, R can be used without limitations and with similar efficiency.  <\/p>\n<h2>Image Classification with Keras<\/h2>\n<p>In order to test my hypothesis, I am going to perform image classification using the <a href=\"https:\/\/www.kaggle.com\/moltean\/fruits\/data\" target=\"_blank\">fruit images data<\/a> from kaggle and train a CNN model with four hidden layers: two 2D convolutional layers, one pooling layer and one dense layer. <a href=\"https:\/\/youtu.be\/_e-LFe_igno\" target=\"_blank\">RMSProp<\/a> is being used as the <a href=\"https:\/\/keras.io\/optimizers\/\" target=\"_blank\">optimizer function<\/a>.<\/p>\n<h3>Tech stack<\/h3>\n<p><strong>Hardware<\/strong>:<br \/>\nCPU: Intel Core i7-7700HQ: 4 cores (8 threads),  2800 &#8211; 3800 (Boost) MHz core clock<br \/>\nGPU: Nvidia Geforce GTX 1050 Ti Mobile: 4Gb vRAM, 1493 &#8211; 1620 (Boost) MHz core clock<br \/>\nRAM: 16 Gb<\/p>\n<p><strong>Software<\/strong>:<br \/>\nOS: Linux Ubuntu 16.04<br \/>\nR: ver. 3.4.4<br \/>\nPython: ver. 3.6.3<br \/>\nKeras: ver. 2.2<br \/>\nTensorflow: ver. 1.7.0<br \/>\nCUDA: ver. 9.0 (note that the current tensorflow version supports ver. 9.0 while the up-to-date version of cuda is 9.2)<br \/>\ncuDNN: ver. 7.0.5 (note that the current tensorflow version supports ver. 7.0 while the up-to-date version of cuDNN is 7.1)<\/p>\n<h2>Code<\/h2>\n<p>The R and Python code snippets used for CNN model building are present bellow. Thanks to fruitful collaboration between F. Chollet and J.J. Allaire, the logic and functions names in R are alike the Python ones.<\/p>\n<h3>R<\/h3>\n<pre>\r\n## Courtesy: Dr. Shirin Glander. Code source: https:\/\/shirinsplayground.netlify.com\/2018\/06\/keras_fruits\/\r\n\r\nlibrary(keras)\r\nstart &lt;- Sys.time()\r\nfruit_list &lt;- c(&quot;Kiwi&quot;, &quot;Banana&quot;, &quot;Plum&quot;, &quot;Apricot&quot;, &quot;Avocado&quot;, &quot;Cocos&quot;, &quot;Clementine&quot;, &quot;Mandarine&quot;, &quot;Orange&quot;,\r\n                &quot;Limes&quot;, &quot;Lemon&quot;, &quot;Peach&quot;, &quot;Plum&quot;, &quot;Raspberry&quot;, &quot;Strawberry&quot;, &quot;Pineapple&quot;, &quot;Pomegranate&quot;)\r\n\r\n# number of output classes (i.e. fruits)\r\noutput_n &lt;- length(fruit_list)\r\n\r\n# image size to scale down to (original images are 100 x 100 px)\r\nimg_width &lt;- 20\r\nimg_height &lt;- 20\r\ntarget_size &lt;- c(img_width, img_height)\r\n\r\n# RGB = 3 channels\r\nchannels &lt;- 3\r\n\r\n# path to image folders\r\npath &lt;- &quot;path\/to\/folder\/with\/data&quot;\r\ntrain_image_files_path &lt;- file.path(path, &quot;fruits-360&quot;, &quot;Training&quot;)\r\nvalid_image_files_path &lt;- file.path(path, &quot;fruits-360&quot;, &quot;Test&quot;)\r\n\r\n# optional data augmentation\r\ntrain_data_gen %&gt;% image_data_generator(\r\n  rescale = 1\/255\r\n)\r\n\r\n# Validation data shouldn&#039;t be augmented! But it should also be scaled.\r\nvalid_data_gen &lt;- image_data_generator(\r\n  rescale = 1\/255\r\n)  \r\n\r\n# training images\r\ntrain_image_array_gen &lt;- flow_images_from_directory(train_image_files_path, \r\n                                                    train_data_gen,\r\n                                                    target_size = target_size,\r\n                                                    class_mode = &#039;categorical&#039;,\r\n                                                    classes = fruit_list,\r\n                                                    seed = 42)\r\n\r\n# validation images\r\nvalid_image_array_gen &lt;- flow_images_from_directory(valid_image_files_path, \r\n                                                    valid_data_gen,\r\n                                                    target_size = target_size,\r\n                                                    class_mode = &#039;categorical&#039;,\r\n                                                    classes = fruit_list,\r\n                                                    seed = 42)\r\n\r\n### model definition\r\n# number of training samples\r\ntrain_samples &lt;- train_image_array_gen$n\r\n# number of validation samples\r\nvalid_samples &lt;- valid_image_array_gen$n\r\n\r\n# define batch size and number of epochs\r\nbatch_size &lt;- 32\r\nepochs &lt;- 10\r\n\r\n# initialise model\r\nmodel &lt;- keras_model_sequential()\r\n\r\n# add layers\r\nmodel %&gt;% \r\n  layer_conv_2d(filter = 32, kernel_size = c(3,3), padding = 'same', input_shape = c(img_width, img_height, channels)) %&gt;%\r\n  layer_activation('relu') %&gt;%\r\n  \r\n  # Second hidden layer\r\n  layer_conv_2d(filter = 16, kernel_size = c(3,3), padding = 'same') %&gt;%\r\n  layer_activation_leaky_relu(0.5) %&gt;%\r\n  layer_batch_normalization() %&gt;%\r\n  \r\n  # Use max pooling\r\n  layer_max_pooling_2d(pool_size = c(2,2)) %&gt;%\r\n  layer_dropout(0.25) %&gt;%\r\n  \r\n  # Flatten max filtered output into feature vector \r\n  # and feed into dense layer\r\n  layer_flatten() %&gt;%\r\n  layer_dense(100) %&gt;%\r\n  layer_activation('relu') %&gt;%\r\n  layer_dropout(0.5) %&gt;%\r\n  \r\n  # Outputs from dense layer are projected onto output layer\r\n  layer_dense(output_n) %&gt;% \r\n  layer_activation('softmax')\r\n\r\n# compile\r\nmodel %&gt;% compile(\r\n  loss = 'categorical_crossentropy',\r\n  optimizer = optimizer_rmsprop(lr = 0.0001, decay = 1e-6),\r\n  metrics = 'accuracy'\r\n)\r\n# fit\r\nhist &lt;- fit_generator(\r\n  # training data\r\n  train_image_array_gen,\r\n  \r\n  # epochs\r\n  steps_per_epoch = as.integer(train_samples \/ batch_size), \r\n  epochs = epochs,\r\n  \r\n  # validation data\r\n  validation_data = valid_image_array_gen,\r\n  validation_steps = as.integer(valid_samples \/ batch_size),\r\n  \r\n  # print progress\r\n  verbose = 2,\r\n  callbacks = list(\r\n    # save best model after every epoch\r\n    callback_model_checkpoint(file.path(path, \"fruits_checkpoints.h5\"), save_best_only = TRUE),\r\n    # only needed for visualising with TensorBoard\r\n    callback_tensorboard(log_dir = file.path(path, \"fruits_logs\"))\r\n  )\r\n)\r\n\r\ndf_out &lt;- hist$metrics %&gt;% \r\n  {data.frame(acc = .$acc[epochs], val_acc = .$val_acc[epochs], elapsed_time = as.integer(Sys.time()) - as.integer(start))}\r\n<\/pre>\n<h3>Python<\/h3>\n<pre>\r\n## Author: D. Kisler  - adoptation of R code from https:\/\/shirinsplayground.netlify.com\/2018\/06\/keras_fruits\/\r\n\r\nfrom keras.preprocessing.image import ImageDataGenerator\r\nfrom keras.models import Sequential\r\nfrom keras.layers import (Conv2D,\r\n                          Dense,\r\n                          LeakyReLU,\r\n                          BatchNormalization, \r\n                          MaxPooling2D, \r\n                          Dropout,\r\n                          Flatten)\r\nfrom keras.optimizers import RMSprop\r\nfrom keras.callbacks import ModelCheckpoint, TensorBoard\r\nimport PIL.Image\r\nfrom datetime import datetime as dt\r\n\r\nstart = dt.now()\r\n\r\n# fruits categories\r\nfruit_list = [\"Kiwi\", \"Banana\", \"Plum\", \"Apricot\", \"Avocado\", \"Cocos\", \"Clementine\", \"Mandarine\", \"Orange\",\r\n                \"Limes\", \"Lemon\", \"Peach\", \"Plum\", \"Raspberry\", \"Strawberry\", \"Pineapple\", \"Pomegranate\"]\r\n# number of output classes (i.e. fruits)\r\noutput_n = len(fruit_list)\r\n# image size to scale down to (original images are 100 x 100 px)\r\nimg_width = 20\r\nimg_height = 20\r\ntarget_size = (img_width, img_height)\r\n# image RGB channels number\r\nchannels = 3\r\n# path to image folders\r\npath = \"path\/to\/folder\/with\/data\"\r\ntrain_image_files_path = path + \"fruits-360\/Training\"\r\nvalid_image_files_path = path + \"fruits-360\/Test\"\r\n\r\n## input data augmentation\/modification\r\n# training images\r\ntrain_data_gen = ImageDataGenerator(\r\n  rescale = 1.\/255\r\n)\r\n# validation images\r\nvalid_data_gen = ImageDataGenerator(\r\n  rescale = 1.\/255\r\n)\r\n\r\n## getting data\r\n# training images\r\ntrain_image_array_gen = train_data_gen.flow_from_directory(train_image_files_path,                                                            \r\n                                                           target_size = target_size,\r\n                                                           classes = fruit_list, \r\n                                                           class_mode = 'categorical',\r\n                                                           seed = 42)\r\n\r\n# validation images\r\nvalid_image_array_gen = valid_data_gen.flow_from_directory(valid_image_files_path, \r\n                                                           target_size = target_size,\r\n                                                           classes = fruit_list,\r\n                                                           class_mode = 'categorical',\r\n                                                           seed = 42)\r\n\r\n## model definition\r\n# number of training samples\r\ntrain_samples = train_image_array_gen.n\r\n# number of validation samples\r\nvalid_samples = valid_image_array_gen.n\r\n# define batch size and number of epochs\r\nbatch_size = 32\r\nepochs = 10\r\n\r\n# initialise model\r\nmodel = Sequential()\r\n\r\n# add layers\r\n# input layer\r\nmodel.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'same', input_shape = (img_width, img_height, channels), activation = 'relu'))\r\n# hiddel conv layer\r\nmodel.add(Conv2D(filters = 16, kernel_size = (3,3), padding = 'same'))\r\nmodel.add(LeakyReLU(.5))\r\nmodel.add(BatchNormalization())\r\n# using max pooling\r\nmodel.add(MaxPooling2D(pool_size = (2,2)))\r\n# randomly switch off 25% of the nodes per epoch step to avoid overfitting\r\nmodel.add(Dropout(.25))\r\n# flatten max filtered output into feature vector\r\nmodel.add(Flatten())\r\n# output features onto a dense layer\r\nmodel.add(Dense(units = 100, activation = 'relu'))\r\n# randomly switch off 25% of the nodes per epoch step to avoid overfitting\r\nmodel.add(Dropout(.5))\r\n# output layer with the number of units equal to the number of categories\r\nmodel.add(Dense(units = output_n, activation = 'softmax'))\r\n\r\n# compile the model\r\nmodel.compile(loss = 'categorical_crossentropy', \r\n              metrics = ['accuracy'], \r\n              optimizer = RMSprop(lr = 1e-4, decay = 1e-6))\r\n\r\n# train the model\r\nhist = model.fit_generator(\r\n  # training data\r\n  train_image_array_gen,\r\n\r\n  # epochs\r\n  steps_per_epoch = train_samples \/\/ batch_size, \r\n  epochs = epochs, \r\n\r\n  # validation data\r\n  validation_data = valid_image_array_gen,\r\n  validation_steps = valid_samples \/\/ batch_size,\r\n\r\n  # print progress\r\n  verbose = 2,\r\n  callbacks = [\r\n    # save best model after every epoch\r\n    ModelCheckpoint(\"fruits_checkpoints.h5\", save_best_only = True),\r\n    # only needed for visualising with TensorBoard\r\n    TensorBoard(log_dir = \"fruits_logs\")\r\n  ]\r\n)\r\n\r\ndf_out = {'acc': hist.history['acc'][epochs - 1], 'val_acc': hist.history['val_acc'][epochs - 1], 'elapsed_time': (dt.now() - start).seconds}\r\n<\/pre>\n<h2>Experiment<\/h2>\n<p>The models above were <strong>trained 10 times<\/strong> with R and Pythons on GPU and CPU, the elapsed time and the final accuracy after 10 epochs were measured. The results of the measurements are presented on the plots below (click the plot to be redirected to <a href=\"https:\/\/plot.ly\" target=\"_blank\">plotly<\/a> interactive plots).<\/p>\n<p><a href=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/3.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/3-490x350.png\" alt=\"\" width=\"490\" height=\"350\" class=\"alignnone size-medium wp-image-14284\" srcset=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/3-490x350.png 490w, https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/3.png 700w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/5.png\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/5-490x350.png\" alt=\"\" width=\"490\" height=\"350\" class=\"alignnone size-medium wp-image-14285\" srcset=\"https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/5-490x350.png 490w, https:\/\/datascienceplus.com\/wp-content\/uploads\/2018\/06\/5.png 700w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><\/a><\/p>\n<p>From the plots above, one can see that:<\/p>\n<ul>\n<li>the accuracy of your model doesn&#8217;t depend on the language you use to build and train it (the plot shows only train accuracy, but the model doesn&#8217;t have high variance and the bias accuracy is around 99% as well).<\/li>\n<li>even though 10 measurements may be not convincing, but Python would reduce (by up to 15%) the time required to train your CNN model. This is somewhat expected because R uses Python under the hood when executes Keras functions.<\/li>\n<\/ul>\n<p>Let&#8217;s perform unpaired t-test assuming that all our observations are normally distributed.<\/p>\n<pre>\r\nlibrary(dplyr)\r\nlibrary(data.table)\r\n# fetch the data used to plot graphs\r\nd &lt;- fread('keras_elapsed_time_rvspy.csv')\r\n\r\n# unpaired t test:\r\n# t_score = (mean1 - mean2)\/sqrt(stdev1^2\/n1+stdev2^2\/n2)\r\n\r\nd %&gt;%\r\n    group_by(lang, eng) %&gt;%\r\n    summarise(el_mean = mean(elapsed_time),\r\n              el_std = sd(elapsed_time),\r\n              n = n()) %&gt;% data.frame() %&gt;%\r\n    group_by(eng) %&gt;%\r\n    summarise(t_score = round(diff(el_mean)\/sqrt(sum(el_std^2\/n)), 2))\r\n<\/pre>\n<table>\n<tr>\n<th>eng<\/th>\n<th>t_score<\/th>\n<\/tr>\n<tr>\n<td>cpu<\/td>\n<td>11.38<\/td>\n<\/tr>\n<tr>\n<td>gpu<\/td>\n<td>9.64<\/td>\n<\/tr>\n<\/table>\n<p>T-score reflects a significant difference between the time required to train a CNN model in R compared to Python as we saw on the plot above. <\/p>\n<h2>Summary<\/h2>\n<p>Building and training CNN model in R using Keras is as &#8220;easy&#8221; as in Python with the same coding logic and functions naming convention. Final accuracy of your Keras model will depend on the neural net architecture, hyperparameters tuning, training duration, train\/test data amount etc., but not on the programming language you would use for your DS project. Training a CNN Keras model in Python may be up to 15% faster compared to R<\/p>\n<h2>P.S.<\/h2>\n<p>If you would like to know more about Keras and to be able to build models with this awesome library, I recommend you these books: <\/p>\n<ul>\n<li><a href=\"https:\/\/www.amazon.com\/dp\/1787125939\" target=\"_blank\"> Deep Learning with Python by F. Chollet<\/a> (one of the Keras creators)<\/li>\n<li><a href=\"https:\/\/www.amazon.com\/dp\/161729554X\" target=\"_blank\"> Deep Learning with R by F. Chollet and J.J. Allaire<\/a><\/li>\n<\/ul>\n<p>As well as this <a href=\"https:\/\/www.udemy.com\/zero-to-deep-learning\/\" target=\"_blank\">Udemy course<\/a> to start your journey with Keras.<\/p>\n<p>Thanks a lot for your attention! I hope this post would be helpful for an aspiring data scientist to gain an understanding of use cases for different technologies and to avoid being biased when it comes to the selection of the tools for DS projects accomplishment.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Even though the libraries for R from Python, or Python from R code execution existed since years and despite of a recent announcement of Ursa Labs foundation by Wes McKinney who is aiming to join forces with RStudio foundation, Hadley Wickham in particularly, (find more here) to improve data scientists workflow and unify libraries to [&hellip;]<\/p>\n","protected":false},"author":2211,"featured_media":14286,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[237,92,232,383],"class_list":["post-14263","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-advanced-modeling","tag-keras","tag-machine-learning","tag-rstats","tag-tensorflow"],"views":7696,"_links":{"self":[{"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/posts\/14263","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/users\/2211"}],"replies":[{"embeddable":true,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/comments?post=14263"}],"version-history":[{"count":0,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/posts\/14263\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/media\/14286"}],"wp:attachment":[{"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/media?parent=14263"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/categories?post=14263"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/datascienceplus.com\/wp-json\/wp\/v2\/tags?post=14263"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}