{"id":7895,"date":"2020-09-14T06:00:00","date_gmt":"2020-09-14T00:30:00","guid":{"rendered":"https:\/\/debuggercafe.com\/?p=7895"},"modified":"2023-01-08T17:09:26","modified_gmt":"2023-01-08T11:39:26","slug":"spatial-transformer-network-using-pytorch","status":"publish","type":"post","link":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/","title":{"rendered":"Spatial Transformer Network using PyTorch"},"content":{"rendered":"\n<p>In this tutorial, we will go through the concepts of <strong><a href=\"https:\/\/arxiv.org\/abs\/1506.02025\" target=\"_blank\" rel=\"noreferrer noopener\">Spatial Transformer Networks<\/a><\/strong> in deep learning and neural networks. The paper <strong><a href=\"https:\/\/arxiv.org\/pdf\/1506.02025.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Spatial Transformer Networks<\/a><\/strong> was submitted by Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu in 2015. It addresses a very important problem in Convolutional Neural Networks and computer vision in general as well. In short, it addresses the lack of spatial invariance property in deep convolutional neural networks. We will get to know all about this in detail. <strong><em>We will also apply Spatial Transformer Networks using PyTorch<\/em><\/strong>.<\/p>\n\n\n\n<p><strong><em>What will you learn in this tutorial?<\/em><\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>What are Spatial Transformer Networks (STNs)?<\/em><\/li><li><em>Why are they important and what problems they solve?<\/em><ul><li><em>The problems with standard CNN.<\/em><\/li><li><em>The solution proposed by STN.<\/em><\/li><\/ul><\/li><li><em>Implementing STN using PyTorch to get a strong grasp on the concept.<\/em><ul><li><em>We will use the CIFAR10 dataset.<\/em><\/li><\/ul><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What are Spatial Transformer Networks (STNs)?<\/h2>\n\n\n\n<p>In general, any convolutional neural network that contains a <em><strong>Spatial Transformer<\/strong><\/em> module, we can call it a Spatial Transformer Network. So, now the question is, <strong><em>what are the Spatial Transformer modules?<\/em><\/strong><\/p>\n\n\n\n<p>The spatial transformer module consists of layers of neural networks that can spatially transform an image. These spatial transformations include cropping, scaling, rotations, and deformations as well.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why do We Need STNs?<\/h3>\n\n\n\n<p>Standard convolutional neural networks are not spatially invariant to different types of input data. This means that they suffer from:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>Scale \/ size variation in the input data<\/em>.<\/li><li><em>Rotation variation in the input data.<\/em><\/li><li><em>Clutter in the input data.<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>CNNs perform poorly when the input data contains so much variation. One of the solutions to this is the max-pooling layer. But then again, max-pooling layers do no make the CNN invariant to large transformations in the input data. <\/p>\n\n\n\n<p>This gives rise to the concept of Spatial Transformer Networks. In STNs, the transformer module knows where to apply the transformation to properly scale, resize, and crop and image. <strong><em>We can apply the STN module to the input data directly, or even to the feature maps<\/em><\/strong>. <strong><em>In simple words, we can say that the spatial transformer module acts as an attention mechanism and knows where to focus on the input data.<\/em><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Taking a Look at a Simple Example<\/h3>\n\n\n\n<p>It will be much better if we see an example image.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"490\" height=\"392\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_mnist_exmp.png\" alt=\"The working of Spatial Transformer Network on the Distorted MNIST dataset.\" class=\"wp-image-7997\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_mnist_exmp.png 490w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_mnist_exmp-300x240.png 300w\" sizes=\"auto, (max-width: 490px) 100vw, 490px\" \/><figcaption><strong>Figure 1. The working of Spatial Transformer Network on the Distorted MNIST dataset (<a href=\"https:\/\/arxiv.org\/pdf\/1506.02025.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a>).<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>In <strong>figure 1<\/strong> we see 4 columns, (a), (b), (c), and (d). These images are from the MNIST dataset. Column (a) shows the input image to the Spatial Transformer Network. We can see that some images are deformed and some contain clutter as well. Column (b) shows where the localization network part of the STN focuses on applying the transformations. In column (c) we can see the output after the transformations. The network focuses in the digit 7, rotates the digit 5 to a more appropriate position, and crops the part of digit 6 to remove the clutter.  What we see in column (d) is the classification output after we give the transformed images as an input to a standard CNN classifier.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Benefits of Spatial Transformer Networks<\/h3>\n\n\n\n<p>There are mainly three benefits of Spatial Transformer Networks which makes them easy to use.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>We can include a spatial transformer module almost anywhere in an existing CNN model. Obviously, we will have to change the network architecture a bit, but that is relatively easy to do.<\/em><\/li><li><em>Spatial Transformer Networks are dynamic and flexible. We can easily train STNs with backpropagation algorithm.<\/em><\/li><li><em>They work on both, the input image data directly, and even on the feature map outputs from standard CNN layers.<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>The above three benefits make the usage of STNs much easier and we will also implement them using the PyTorch framework further on. Before that let&#8217;s take a brief look at the architecture of the Spatial Transformer Network.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Architecture of Spatial Transformers<\/h2>\n\n\n\n<p>The architecture of a Spatial Transformer Network is based on three important parts.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>The localization network.<\/em><\/li><li><em>The parameterized sampling grid.<\/em><\/li><li><em>And differentiable image sampling.<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"779\" height=\"332\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_arch.png\" alt=\"High level architecture of Spatial Transformer Neural Network.\" class=\"wp-image-7998\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_arch.png 779w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_arch-300x128.png 300w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/stn_arch-768x327.png 768w\" sizes=\"auto, (max-width: 779px) 100vw, 779px\" \/><figcaption><strong>Figure 2. High level architecture of Spatial Transformer Neural Network (<a href=\"https:\/\/arxiv.org\/pdf\/1506.02025.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a>).<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Figure 2<\/strong> shows the overall architecture of the Spatial Transformer Network.<\/p>\n\n\n\n<p><strong><em>We will go over each of these briefly but enough to help us in coding. We will not go into much of the mathematical details as that is out of scope of this article.<\/em><\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Localization Network<\/h3>\n\n\n\n<p>The localization network takes the input feature map and outputs the parameters of the spatial transformations that should be applied to the feature map. The localization network is a very simple stacking of convolutional layers. <\/p>\n\n\n\n<p>If you take a look at <strong>figure 2<\/strong>, then \\(U\\) is the feature map input to the localization network. It outputs \\(\\theta\\) which are the transformation parameters that are regressed from the localization network. The final regression layers are fully-connected linear layers. In <strong>figure 2<\/strong>, \\(\\mathcal{T}_\\theta\\) is the transformation operation using the parameters \\(\\theta\\).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The Parameterized Sampling Grid<\/h3>\n\n\n\n<p>To get the desired output, the input feature map should be sampled from the parameterized sampling grid. The grid generator outputs the parameterized sampling grid.<\/p>\n\n\n\n<p>Let \\(G\\) be the sampling grid. <strong><em>Now, how do we transform the input feature map to get the desirable results?<\/em><\/strong> Remember, we have the transformation parameters \\(\\theta\\) and the transformation is defined by \\(\\mathcal{T}_\\theta\\). Well, we apply the transformation \\(\\mathcal{T}_\\theta\\) to the grid \\(G\\). That is, \\(\\mathcal{T}_\\theta(G)\\).<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"410\" height=\"350\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/t_theta_g.png\" alt=\"Warping the regular grid with affine transformation using Spatial Transformer Network.\" class=\"wp-image-7999\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/t_theta_g.png 410w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/t_theta_g-300x256.png 300w\" sizes=\"auto, (max-width: 410px) 100vw, 410px\" \/><figcaption><strong>Figure 3. Warping the regular grid with affine transformation using regression parameters theta (<a href=\"https:\/\/arxiv.org\/pdf\/1506.02025.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">Source<\/a>).<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Figure 3 <\/strong>shows the result of warping the regular grid with the affine transformation \\(\\mathcal{T}_\\theta(G)\\).<\/p>\n\n\n\n<p>The output pixels lie of the grid \\(G\\) = \\({\\{G\\}}_i\\), where \\(G_i = (x_i^t, y_i^t)\\). Here, \\((x_i^t, y_i^t)\\) are the target coordinates.<\/p>\n\n\n\n<p>Now, let us assume that \\(\\mathcal{T}_\\theta\\) is a 2D affine tranformation \\(\\mathbf{A}_\\theta\\). Now, the following is the whole transformation operation.<\/p>\n\n\n<p>$$<br \/>\n\\left( \\begin{array}{c} x_i^s \\\\ y_i^s \\end{array} \\right) = \\mathcal{T}_\\theta(G_i) = \\mathbf{A}_\\theta \\left( \\begin{array}{c} x_i^t \\\\ y_i^t \\\\ 1 \\end{array} \\right) =  \\left[ \\begin{array}{cc} \\theta_{11} &amp; \\theta_{12} &amp; \\theta_{13} \\\\ \\theta_{21} &amp; \\theta_{22} &amp; \\theta_{23} \\end{array} \\right]\\left( \\begin{array}{c} x_i^t \\\\ y_i^t \\\\ 1 \\end{array} \\right)<br \/>\n$$<\/p>\n\n\n\n<p>Here, \\((x_i^t, y_i^t)\\) are the target coordinates of the target grid in the output feature map, \\((x_i^s, y_i^s)\\) are the input coordinates in the input feature map, and \\(\\mathbf{A}_\\theta\\) is the affine transformation matrix.<\/p>\n\n\n\n<p>After the sampling grid operation, we have the Differentiable Image Sampling.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Differentiable Image Sampling<\/h3>\n\n\n\n<p>This is the last part of the spatial transformer network. We have the input feature map and also the parameterized sampling grid with us now. To perform the sampling, we give the feature map \\(U\\) and sampling grid \\(\\mathcal{T}_\\theta(G)\\) as input to the sampler (see <strong>figure 2<\/strong>). The sampling kernel is applied to the source coordinates using the parameters \\(\\theta\\) and we get the output \\(V\\).<\/p>\n\n\n\n<p>There is a lot of mathematics involved in this last section which I am skipping. If you read the <strong><a href=\"https:\/\/arxiv.org\/abs\/1506.02025\" target=\"_blank\" rel=\"noreferrer noopener\">paper<\/a><\/strong>, then you will get to know them in much more detail. Although for the coding part, whatever we have covered should be enough. Still, if you want, you can give the paper a read before you move further. That will surely help you understand much of the coding easily.<\/p>\n\n\n\n<p>From the next section, we will dive into the coding part of this tutorial.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Directory Structure and Some Prerequisites<\/h2>\n\n\n\n<p>Before you move further, make sure that you install the latest version of <strong><a href=\"https:\/\/pytorch.org\/get-started\/locally\/\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch<\/a><\/strong> (1.6 at the time of writing this) from <strong><a href=\"https:\/\/pytorch.org\/get-started\/locally\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a><\/strong>. This will make sure that you have all the functionalities available to follow along smoothly.<\/p>\n\n\n\n<p>The PyTorch tutorials have a <strong><a href=\"https:\/\/pytorch.org\/tutorials\/intermediate\/spatial_transformer_tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">Spatial Transformer Networks Tutorial<\/a><\/strong> which uses the digit MNIST dataset. But we will work with the CIFAR10 dataset. This will ensure that we have a bit more complexity to handle and also we will learn how to deal with RGB (colored) images instead of grayscale images using Spatial Transformer Networks.<\/p>\n\n\n\n<p>Now coming to the project directory structure.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">\u251c\u2500\u2500\u2500input\n\u2502   \u2514\u2500\u2500\u2500data\n\u2502\n\u251c\u2500\u2500\u2500outputs\n\u2502       image_0.png\n\u2502       image_1.png\n...\n\u2502       transformed_imgs.gif\n\u2502\n\u2514\u2500\u2500\u2500src\n    \u2502   model.py\n    \u2502   train.py<\/pre>\n\n\n\n<ul class=\"wp-block-list\"><li>The <code>input<\/code> folder will contain the CIFAR10 dataset.<\/li><li>The <code>outputs<\/code> folder will contain all the outputs that the code generates.<\/li><li>In the <code>src<\/code> folder, we have the python scripts. They are <code>model.py<\/code> and <code>train.py<\/code>.<\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Implementing Spatial Transformer Network using PyTorch<\/h2>\n\n\n\n<p>I hope that you have set up your directory as per the above structure. From here onward, we will write the code for this tutorial. First, we will build the Spatial Transformer Network architecture. We will write that code inside the <code>model.py<\/code> file. Then we will write the code to prepare the CIFAR10 data, training, and validation function inside the <code>train.py<\/code> file.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Preparing the Spatial Transformer Network Architecture<\/h3>\n\n\n\n<p>In this section, we will write the PyTorch code for the Spatial Transformer Network Architecture. <strong><em>This code will go into the the <\/em><\/strong><code>model.py<\/code><strong><em> file inside the <\/em><\/strong><code>src<\/code><strong><em> folder.<\/em><\/strong><\/p>\n\n\n\n<p>First, we will write the whole network code in one code block. Then we will get to the explanation part. The following code block defines the Spatial Transformer Network Architecture.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass STN(nn.Module):\n    def __init__(self):\n        super(STN, self).__init__()\n        # simple convnet classifier\n        self.conv1 = nn.Conv2d(3, 6, 5)\n        self.pool = nn.MaxPool2d(2, 2)\n        self.conv2 = nn.Conv2d(6, 16, 5)\n        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n        self.fc2 = nn.Linear(120, 84)\n        self.fc3 = nn.Linear(84, 10)\n\n        # spatial transformer localization network\n        self.localization = nn.Sequential(\n            nn.Conv2d(3, 64, kernel_size=7),\n            nn.MaxPool2d(2, stride=2),\n            nn.ReLU(True),\n            nn.Conv2d(64, 128, kernel_size=5),\n            nn.MaxPool2d(2, stride=2),\n            nn.ReLU(True)\n        )\n\n        # tranformation regressor for theta\n        self.fc_loc = nn.Sequential(\n            nn.Linear(128*4*4, 256),\n            nn.ReLU(True),\n            nn.Linear(256, 3 * 2)\n        )\n\n        # initializing the weights and biases with identity transformations\n        self.fc_loc[2].weight.data.zero_()\n        self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], \n                                                    dtype=torch.float))\n\n    def stn(self, x):\n        xs = self.localization(x)\n        xs = xs.view(-1, xs.size(1)*xs.size(2)*xs.size(3))\n\n        # calculate the transformation parameters theta\n        theta = self.fc_loc(xs)\n        # resize theta\n        theta = theta.view(-1, 2, 3) \n        # grid generator => transformation on parameters theta\n        grid = F.affine_grid(theta, x.size())\n        # grid sampling => applying the spatial transformations\n        x = F.grid_sample(x, grid)\n\n        return x\n\n    def forward(self, x):\n        # transform the input\n        x = self.stn(x)\n        \n        # forward pass through the classifier \n        x = self.pool(F.relu(self.conv1(x)))\n        x = self.pool(F.relu(self.conv2(x)))\n        x = x.view(-1, 16*5*5)\n        x = F.relu(self.fc1(x))\n        x = F.relu(self.fc2(x))\n        x = self.fc3(x)\n        return F.log_softmax(x, dim=1)<\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">Explanation of the STN Architecture<\/h4>\n\n\n\n<p>I know that the above code looks complicated but I will try my best to make it as simple as possible.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Starting from <strong>line 5<\/strong>, we have the <code>STN()<\/code> class which contains the STN architecture. <\/li><li>From <strong>line 6<\/strong>, we have the <code>__init__()<\/code> function. In the <code>__init__()<\/code> function, from <strong>line 9 till 14<\/strong>, we define a simple convolutional classifier network to classify the CIFAR10 dataset images. I hope that this classification network is quite self-explanatory.<\/li><li>Starting from <strong>line 17 till 24<\/strong>, we have the <strong><em>Localization Network<\/em><\/strong> (<code>self.localization<\/code>) of the Spatial Transformer Network. First, we have a 2D convolutional layer on <strong>line 18<\/strong> with 3 input channels as the CIFAR10 datasets images are colored with three channels (RGB). It is followed by max-pooling and ReLU activation. We repeat three such layers again from <strong>line 21 till 23<\/strong>. <\/li><li>Now to regress the transformation parameters \\(\\theta\\), we need fully connected linear layers. This is exactly what the <code>self.fc_loc<\/code> module does from <strong>line 27 to 31<\/strong>. Now, you will see that the first linear layer&#8217;s input features are 128*4*4. This is something that we have to get through the <code>self.localization<\/code> module&#8217;s last layer&#8217;s output.<\/li><li>From <strong>line 34 to 35<\/strong>, we initialize the <code>self.fc_loc<\/code> module&#8217;s last linear layer weight and biases. We initialize them with identity transformations.<\/li><li>Next up, we have the <code>stn()<\/code> function from <strong>line 38<\/strong>. First, we get the feature maps using the <code>self.localization<\/code> module. Then we resize them and pass them onto the <code>self.fc_loc<\/code> module to get the transformation parameters <code>theta<\/code> on <strong>line 43<\/strong>. On <strong>line 47<\/strong>, we generate the parameterized sampling grid using the <code>affine_grid()<\/code> function. Finally, we apply the spatial transformations on <strong>line 49.<\/strong> We return the transformed feature maps on <strong>line 51<\/strong>.<\/li><li>Finally, we have the <code>forward()<\/code> function from <strong>line 53<\/strong>. First, we execute the <code>stn()<\/code> function to get the transformed inputs. Then, from <strong>line 57<\/strong>, we perform a simple forward pass through the classification network using these transformed feature maps.<\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Some Important Notes<\/h4>\n\n\n\n<p>I will try to answer an important question that some of you may have before moving further.<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>Why do we need to perform a classification after spatially transforming the inputs?<\/em><ul><li><strong><em>This a very valid question actually. Let&#8217;s say that we spatially transform the inputs and visualize how they look. Now what? We need some measurement criteria to determine how good the spatial transformations are, right? For that we can simply classify the transformed images from the Spatial Transformer Network instead of the original images. And with each epoch we will try to reduce the loss just as we do with general classification. The feedback from the backpropagation will force the network to return better spatial transformations with each epoch. We will also visualize in the end how with each passing epoch, the STN transforms the images spatially. I hope that this answers some of your questions.<\/em><\/strong><\/li><\/ul><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Writing the Code to Train the STN on the CIFAR10 Dataset<\/h2>\n\n\n\n<p>This part is going to be easy. We will write the code to:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>Prepare the CIFAR10 dataset.<\/em><\/li><li><em>Define the learning parameters for our Spatial Transformer Network.<\/em><\/li><li><em>Write the training and validation functions.<\/em><\/li><li><em>And finally, visualize the transformed images.<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>This part will not need much explanation as you will already be familiar with all the above steps. These steps are conventional to any image classification task using deep learning and PyTorch.<\/p>\n\n\n\n<p><strong><em>All the code from here onward, will go into the<\/em><\/strong> <code>train.py<\/code> <strong><em>file<\/em><\/strong>.<\/p>\n\n\n\n<p>Let&#8217;s start with the imports.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">import torch\nimport torch.optim as optim\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torchvision\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport model\nimport imageio\n\nfrom torch.utils.data import DataLoader, Dataset\nfrom torchvision import datasets, transforms\nfrom tqdm import tqdm<\/pre>\n\n\n\n<p>The above are all the imports that we need. We need the <code>imageio<\/code> module as we will be saving the transformed images from each epoch as a <code>.gif<\/code> file. We will analyze this short video file in the end.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Define the Learning Parameters, Transforms, and Computation Device<\/h3>\n\n\n\n<p>Next, we will define the learning parameters, the image transforms for the CIFAR10 dataset, and the computation device for training.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># learning parameters\nlearning_rate = 0.001\nepochs = 40\nbatch_size = 64\n\n# image transforms\ntransform = transforms.Compose([\n                       transforms.ToTensor(),\n                       transforms.Normalize((0.4914, 0.4822, 0.4465), \n                                            (0.2023, 0.1994, 0.2010)),\n                   ])\n#computation device\ndevice =  torch.device('cuda' if torch.cuda.is_available else 'cpu')<\/pre>\n\n\n\n<p>We will be using a batch size of 64. For the image transforms, we are just converting the images to tensor and normalizing them.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Prepare the Training and Validation Dataset<\/h3>\n\n\n\n<p>The following block of the code prepares the training and validation dataset. We will use the <code>dataset<\/code> module of PyTorch to get the CIFAR10 dataset.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># train and validation datasets\ntrain_data = datasets.CIFAR10(\n    root='..\/input\/data',\n    train=True,\n    download=True,\n    transform=transform\n)\nval_data = datasets.CIFAR10(\n    root='..\/input\/data',\n    train=False,\n    download=True,\n    transform=transform\n)<\/pre>\n\n\n\n<p>The next block of code will prepare the training and validation data loaders.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># train data loader\ntrain_loader = DataLoader(\n    train_data, \n    batch_size=batch_size,\n    shuffle=True\n)\n# train data loader\nval_loader = DataLoader(\n    val_data, \n    batch_size=batch_size,\n    shuffle=False\n)<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Initialize the Model, Optimizer, and Loss Function<\/h3>\n\n\n\n<p>Here, we will initialize the <code>STN()<\/code> model first. We will use the <code>SGD<\/code> optimizer and the CrossEntropy loss function.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># initialize the model\nmodel = model.STN().to(device)\n# initialize the optimizer\noptimizer = optim.SGD(model.parameters(), lr=learning_rate)\n# initilaize the loss function\ncriterion = nn.CrossEntropyLoss()<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Define the Training Function<\/h3>\n\n\n\n<p>We will write the training function now, that is the <code>fit()<\/code> function. It is a very simple function that you must have seen a lot of times before.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># training function\ndef fit(model, dataloader, optimizer, criterion, train_data):\n    print('Training')\n    model.train()\n    train_running_loss = 0.0\n    train_running_correct = 0\n    for i, data in tqdm(enumerate(dataloader), total=int(len(train_data)\/dataloader.batch_size)):\n        data, target = data[0].to(device), data[1].to(device)\n        optimizer.zero_grad()\n        outputs = model(data)\n        loss = criterion(outputs, target)\n        train_running_loss += loss.item()\n        _, preds = torch.max(outputs.data, 1)\n        train_running_correct += (preds == target).sum().item()\n        loss.backward()\n        optimizer.step()\n        \n    train_loss = train_running_loss\/len(dataloader.dataset)\n    train_accuracy = 100. * train_running_correct\/len(dataloader.dataset)    \n    return train_loss, train_accuracy<\/pre>\n\n\n\n<p>Basically, for each batch of image we are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><em>Calculating the loss and accuracy.<\/em><\/li><li><em>Backpropagating the loss.<\/em><\/li><li><em>And updating the optimizer parameters.<\/em><\/li><\/ul>\n\n\n\n<p><\/p>\n\n\n\n<p>Finally, for each epoch we are returning the accuracy and loss values.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Define the Loss Function<\/h3>\n\n\n\n<p>For the loss function, we will not need to backpropagate the loss or update the optimizer parameters.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># validation function\ndef validate(model, dataloader, optimizer, criterion, val_data):\n    print('Validating')\n    model.eval()\n    val_running_loss = 0.0\n    val_running_correct = 0\n    with torch.no_grad():\n        for i, data in tqdm(enumerate(dataloader), total=int(len(val_data)\/dataloader.batch_size)):\n            data, target = data[0].to(device), data[1].to(device)\n            outputs = model(data)\n            loss = criterion(outputs, target)\n            \n            val_running_loss += loss.item()\n            _, preds = torch.max(outputs.data, 1)\n            val_running_correct += (preds == target).sum().item()\n        \n        val_loss = val_running_loss\/len(dataloader.dataset)\n        val_accuracy = 100. * val_running_correct\/len(dataloader.dataset)        \n        return val_loss, val_accuracy<\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">Transforming the Output Images to NumPy Format<\/h3>\n\n\n\n<p>We will be saving one batch of image of each epoch from the validation set after running it through the <code>STN()<\/code> model. But we cannot save the PyTorch transformed image directly. We will first have to convert the images to NumPy format and denormalize the grid of images as well.<\/p>\n\n\n\n<p>The following function, that is <code>transform_to_numpy()<\/code> does that for us.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">def transform_to_numpy(image_grid, epoch):\n    \"\"\"\n    This function transforms the PyTorch image grids\n    into NumPy format that we will denormalize and save \n    as PNG file.\n    \"\"\"\n    image_grid = image_grid.numpy().transpose((1, 2, 0))\n    mean = np.array([0.485, 0.456, 0.406])\n    std = np.array([0.229, 0.224, 0.225])\n    image_grid = std * image_grid + mean\n    return image_grid<\/pre>\n\n\n\n<p>We can also use the <code>save_image()<\/code> function from torchvision but the above function will also help us in saving the image grids as <code>.gif<\/code> files. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Writing the Code to Get One Batch of Validation Data from the STN Model<\/h3>\n\n\n\n<p>To visualize how well our model is doing, we will pass one batch of images through the <code>STN()<\/code> model. We will save that output as a PNG file and also use the <code>imageio<\/code> module to save it as a <code>.gif<\/code> file.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">images = []\ndef stn_grid(epoch):\n    \"\"\"\n    This function will pass one batch of the test\n    image to the STN model and get the transformed images\n    after each epoch to save as PNG file and also as\n    GIFFY file.\n    \"\"\"\n    with torch.no_grad():\n        data = next(iter(val_loader))[0].to(device)\n\n        transformed_image = model.stn(data).cpu().detach()\n\n        image_grid = torchvision.utils.make_grid(transformed_image)\n\n        # save the grid image\n        image_grid = transform_to_numpy(image_grid, epoch)\n        plt.imshow(image_grid)\n        plt.savefig(f\"..\/outputs\/image_{epoch}.png\")\n        plt.close()\n        images.append(image_grid)<\/pre>\n\n\n\n<p>The <code>images<\/code> list will store all the image grids that we get from <code>transform_to_numpy()<\/code> function. We are appending those NumPy image grids to <code>images<\/code> at <strong>line 21<\/strong>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Training the STN model<\/h3>\n\n\n\n<p>For training, we will just have to run a simple for loop for the number of epochs that we want to train.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\"># train for certain epochs\nfor epoch in range(epochs):\n    print(f\"Epoch {epoch+1} of {epochs}\")\n    train_epoch_loss, train_epoch_accuracy = fit(model, train_loader, \n                                                 optimizer, criterion, \n                                                 train_data)\n    val_epoch_loss, val_epoch_accuracy = validate(model, val_loader, \n                                                 optimizer, criterion, \n                                                 val_data)\n    print(f\"Train Loss: {train_epoch_loss:.4f}, Train Acc: {train_epoch_accuracy:.2f}\")\n    print(f\"Validation Loss: {val_epoch_loss:.4f}, Val Acc: {val_epoch_accuracy:.2f}\")\n    stn_grid(epoch)<\/pre>\n\n\n\n<p>Note that at <strong>line 12<\/strong> we are calling the <code>stn_grid()<\/code> function to convert one batch of the validation data into NumPy format.<\/p>\n\n\n\n<p>The final step is to save all the NumPy image grids as a <code>.gif<\/code> file using the <code>imageio<\/code> module. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"python\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">imageio.mimsave('..\/outputs\/transformed_imgs.gif', images)<\/pre>\n\n\n\n<p>That&#8217;s it. This is all the code that we need for training our <code>STN()<\/code> model.<\/p>\n\n\n\n<p>Now, let&#8217;s execute <code>train.py<\/code> and see how well our model performs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Executing the train.py File<\/h2>\n\n\n\n<p>Open up your terminal\/command prompt and <code>cd<\/code> into the <code>src<\/code> folder. Now, execute the <code>train.py<\/code> file.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"generic\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">python train.py<\/pre>\n\n\n\n<p>I am showing the truncated output below.<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"raw\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">Epoch 1 of 40\nTraining\n  0%|                                                                          | 0\/781 [00:00&lt;?\n782it [00:21, 35.72it\/s]\nValidating\n157it [00:03, 44.73it\/s]\nTrain Loss: 0.0353, Train Acc: 16.09\nValidation Loss: 0.0323, Val Acc: 25.98\n...\nEpoch 40 of 40\nTraining\n782it [00:20, 37.69it\/s]\nValidating\n157it [00:03, 44.57it\/s]\nTrain Loss: 0.0095, Train Acc: 78.52\nValidation Loss: 0.0184, Val Acc: 63.75<\/pre>\n\n\n\n<p>By the end of 40 epoch, we have training accuracy of 78.52% and validation accuracy of 63.75%. The training loss is 0.0095 and validation loss 0.0184. The results are not too good. Still let&#8217;s see how well our model has spatially transformed the images.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Visualizing the Spatial Transformations Done by the STN Model<\/h2>\n\n\n\n<p>The following image shows the results after the first epoch.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_0.png\" alt=\"Spatially transformed images after the first epoch.\" class=\"wp-image-8023\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_0.png 640w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_0-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption><strong>Figure 4. Spatially transformed images after the first epoch.<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p>In <strong>figure 4,<\/strong> we can see that the spatial transformations are not too evident. Probably this is because it is only the first epoch and the neural network has not learned much. Let&#8217;s see the results from the last epoch.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"480\" src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_39.png\" alt=\"Result of Spatial Transformation Network after 40 epochs.\" class=\"wp-image-8025\" srcset=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_39.png 640w, https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/image_39-300x225.png 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><figcaption><strong>Figure 5. Result of Spatial Transformation Network after 40 epochs.<\/strong><\/figcaption><\/figure><\/div>\n\n\n\n<p><strong>Figure 5<\/strong> shows the results from the epoch 40, that is the last epoch. The spatial transformations here are very prominent. Out Spatial Transformer Network model has cropped and resized most of the images to the center. It has rotated many of the images to an orientation that it feels will be helpful. Although some of the orientations are not centered. Maybe a bit of more training will help.<\/p>\n\n\n\n<p>Finally, let&#8217;s take a look at the <code>.gif<\/code> file that we have saved. This short video will give us the best idea of how our Spatial Transformer Network performs in each epoch.<\/p>\n\n\n\n<figure class=\"wp-block-video aligncenter\"><video height=\"400\" style=\"aspect-ratio: 400 \/ 400;\" width=\"400\" autoplay controls loop src=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/transformed_images.mp4\"><\/video><figcaption><strong>Clip 1. Images transformed by the Spatial Transformer Neural Network after each epoch.<\/strong><\/figcaption><\/figure>\n\n\n\n<p><strong>Clip 1<\/strong> shows the images transformed by the Spatial Transformer Network after each epoch. We can see that after each epoch, the neural network is resizing, cropping, and centering the images a bit better. Still, more training will probably help even further.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Summary and Conclusion<\/h2>\n\n\n\n<p>In this tutorial, you got to learn about Spatial Transformer Networks. You got to know the basics and also implement the code for Spatial Transformer Network using PyTorch. This is a starting point and you can now start to experiment even further by improving this code. Most probably we will implement some more advanced spatial transformation techniques in future articles.<\/p>\n\n\n\n<p>If you have any doubts, suggestions, or thoughts, then you can leave them in the comment section. I will surely address them.<\/p>\n\n\n\n<p>You can contact me using the <strong><a aria-label=\"Contact (opens in a new tab)\" href=\"https:\/\/debuggercafe.com\/contact-us\/\" target=\"_blank\" rel=\"noreferrer noopener\">Contact<\/a><\/strong> section. You can also find me on <strong><a aria-label=\"LinkedIn (opens in a new tab)\" href=\"https:\/\/www.linkedin.com\/in\/sovit-rath\/\" target=\"_blank\" rel=\"noreferrer noopener\">LinkedIn<\/a><\/strong>, and <strong><a href=\"https:\/\/x.com\/SovitRath5\" target=\"_blank\" rel=\"noreferrer noopener\">X<\/a><\/strong>.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article describes the concept of Spatial Transformer Networks in deep learning and computer vision. The readers will also get hands-on coding experience by applying the concepts using PyTorch framework.<\/p>\n","protected":false},"author":1,"featured_media":8029,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[76,113,59,119,17,57,90],"tags":[77,112,61,123,24,62,91],"class_list":["post-7895","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-computer-vision","category-convolutional-neural-networks","category-deep-learning","category-image-classification","category-machine-learning","category-neural-networks","category-pytorch","tag-computer-vision","tag-convolutional-neural-networks","tag-deep-learning","tag-image-classification","tag-machine-learning","tag-neural-networks","tag-pytorch"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Spatial Transformer Network using PyTorch<\/title>\n<meta name=\"description\" content=\"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spatial Transformer Network using PyTorch\" \/>\n<meta property=\"og:description\" content=\"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\" \/>\n<meta property=\"og:site_name\" content=\"DebuggerCafe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/profile.php?id=100013731104496\" \/>\n<meta property=\"article:published_time\" content=\"2020-09-14T00:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-01-08T11:39:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Sovit Ranjan Rath\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:site\" content=\"@SovitRath5\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sovit Ranjan Rath\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"17 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\"},\"author\":{\"name\":\"Sovit Ranjan Rath\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"headline\":\"Spatial Transformer Network using PyTorch\",\"datePublished\":\"2020-09-14T00:30:00+00:00\",\"dateModified\":\"2023-01-08T11:39:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\"},\"wordCount\":2972,\"commentCount\":4,\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg\",\"keywords\":[\"Computer Vision\",\"Convolutional Neural Networks\",\"Deep Learning\",\"Image Classification\",\"Machine Learning\",\"Neural Networks\",\"PyTorch\"],\"articleSection\":[\"Computer Vision\",\"Convolutional Neural Networks\",\"Deep Learning\",\"Image Classification\",\"Machine Learning\",\"Neural Networks\",\"PyTorch\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\",\"url\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\",\"name\":\"Spatial Transformer Network using PyTorch\",\"isPartOf\":{\"@id\":\"https:\/\/debuggercafe.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg\",\"datePublished\":\"2020-09-14T00:30:00+00:00\",\"dateModified\":\"2023-01-08T11:39:26+00:00\",\"author\":{\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\"},\"description\":\"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.\",\"breadcrumb\":{\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage\",\"url\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg\",\"contentUrl\":\"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg\",\"width\":1200,\"height\":675,\"caption\":\"Spatial Transformer Network using PyTorch\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/debuggercafe.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Spatial Transformer Network using PyTorch\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/debuggercafe.com\/#website\",\"url\":\"https:\/\/debuggercafe.com\/\",\"name\":\"DebuggerCafe\",\"description\":\"Machine Learning and Deep Learning\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/debuggercafe.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752\",\"name\":\"Sovit Ranjan Rath\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g\",\"caption\":\"Sovit Ranjan Rath\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Spatial Transformer Network using PyTorch","description":"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/","og_locale":"en_US","og_type":"article","og_title":"Spatial Transformer Network using PyTorch","og_description":"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.","og_url":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/","og_site_name":"DebuggerCafe","article_publisher":"https:\/\/www.facebook.com\/profile.php?id=100013731104496","article_published_time":"2020-09-14T00:30:00+00:00","article_modified_time":"2023-01-08T11:39:26+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg","type":"image\/jpeg"}],"author":"Sovit Ranjan Rath","twitter_card":"summary_large_image","twitter_creator":"@SovitRath5","twitter_site":"@SovitRath5","twitter_misc":{"Written by":"Sovit Ranjan Rath","Est. reading time":"17 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#article","isPartOf":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/"},"author":{"name":"Sovit Ranjan Rath","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"headline":"Spatial Transformer Network using PyTorch","datePublished":"2020-09-14T00:30:00+00:00","dateModified":"2023-01-08T11:39:26+00:00","mainEntityOfPage":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/"},"wordCount":2972,"commentCount":4,"image":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg","keywords":["Computer Vision","Convolutional Neural Networks","Deep Learning","Image Classification","Machine Learning","Neural Networks","PyTorch"],"articleSection":["Computer Vision","Convolutional Neural Networks","Deep Learning","Image Classification","Machine Learning","Neural Networks","PyTorch"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/","url":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/","name":"Spatial Transformer Network using PyTorch","isPartOf":{"@id":"https:\/\/debuggercafe.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage"},"image":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage"},"thumbnailUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg","datePublished":"2020-09-14T00:30:00+00:00","dateModified":"2023-01-08T11:39:26+00:00","author":{"@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752"},"description":"Know about Spatial Transformer Networks in deep learning and apply the concepts using the PyTorch framework.","breadcrumb":{"@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#primaryimage","url":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg","contentUrl":"https:\/\/debuggercafe.com\/wp-content\/uploads\/2020\/09\/Spatial-Transformer-Network-using-PyTorch-e1599010243710.jpg","width":1200,"height":675,"caption":"Spatial Transformer Network using PyTorch"},{"@type":"BreadcrumbList","@id":"https:\/\/debuggercafe.com\/spatial-transformer-network-using-pytorch\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/debuggercafe.com\/"},{"@type":"ListItem","position":2,"name":"Spatial Transformer Network using PyTorch"}]},{"@type":"WebSite","@id":"https:\/\/debuggercafe.com\/#website","url":"https:\/\/debuggercafe.com\/","name":"DebuggerCafe","description":"Machine Learning and Deep Learning","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/debuggercafe.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/27719b14d930bd4a88ade40d18b0a752","name":"Sovit Ranjan Rath","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/debuggercafe.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f71ca13ec56d630e7d8045e8b846396068791aa204936c3d74d721c6dd2b4d3c?s=96&d=mm&r=g","caption":"Sovit Ranjan Rath"}}]}},"_links":{"self":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/7895","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/comments?post=7895"}],"version-history":[{"count":125,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/7895\/revisions"}],"predecessor-version":[{"id":8028,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/posts\/7895\/revisions\/8028"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media\/8029"}],"wp:attachment":[{"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/media?parent=7895"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/categories?post=7895"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/debuggercafe.com\/wp-json\/wp\/v2\/tags?post=7895"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}