<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator><link href="https://shubham1810.github.io/feed.xml" rel="self" type="application/atom+xml"/><link href="https://shubham1810.github.io/" rel="alternate" type="text/html"/><updated>2024-08-13T19:19:23+00:00</updated><id>https://shubham1810.github.io/feed.xml</id><title type="html">blank</title><subtitle>My Personal Webpage </subtitle><entry><title type="html">Displaying External Posts on Your al-folio Blog</title><link href="https://shubham1810.github.io/blog/2022/displaying-external-posts-on-your-al-folio-blog/" rel="alternate" type="text/html" title="Displaying External Posts on Your al-folio Blog"/><published>2022-04-23T23:20:09+00:00</published><updated>2022-04-23T23:20:09+00:00</updated><id>https://shubham1810.github.io/blog/2022/displaying-external-posts-on-your-al-folio-blog</id><content type="html" xml:base="https://shubham1810.github.io/blog/2022/displaying-external-posts-on-your-al-folio-blog/"><![CDATA[]]></content><author><name></name></author></entry><entry><title type="html">a post with redirect</title><link href="https://shubham1810.github.io/blog/2021/redirect/" rel="alternate" type="text/html" title="a post with redirect"/><published>2021-07-04T17:39:00+00:00</published><updated>2021-07-04T17:39:00+00:00</updated><id>https://shubham1810.github.io/blog/2021/redirect</id><content type="html" xml:base="https://shubham1810.github.io/blog/2021/redirect/"><![CDATA[]]></content><author><name></name></author><summary type="html"><![CDATA[you can also redirect to assets like pdf]]></summary></entry><entry><title type="html">Semi-Implicit Networks</title><link href="https://shubham1810.github.io/blog/2019/imex/" rel="alternate" type="text/html" title="Semi-Implicit Networks"/><published>2019-07-19T11:00:00+00:00</published><updated>2019-07-19T11:00:00+00:00</updated><id>https://shubham1810.github.io/blog/2019/imex</id><content type="html" xml:base="https://shubham1810.github.io/blog/2019/imex/"><![CDATA[<p>Residual Neural Networks, or ResNets <d-cite key="he2016deep"></d-cite>, became popularised in the recent years, Making it possible to train very deep neural networks while still achieving compelling performance. The core idea behind ResNets is the addition of skip-connections which enables the network to avoid the problem of vanishing gradients upto a large extent, and hence, making it easier for the network to be very deep <em>(One of the examples is ResNet’s 152 layers, compared to VGG’s 19 layers <d-cite key="simonyan2014very"></d-cite> or GoogleNet’s 22 layers <d-cite key="szegedy2015going"></d-cite>).</em></p> <p>The similarity of ResNet architecture with <a href="https://en.wikipedia.org/wiki/Ordinary_differential_equation">Ordinary Differential Equations</a> has been under some attention in recent works <d-cite key="chen2018neural"></d-cite>. The connection raises the issue of forward stability of such methods <em>i.e.</em> the model should not amlify the features through layers when perturbations such as noise, adversarial attacks or general changes appear in the given input.</p> <p>This post closely follows <em>(read shamelessly copies)</em> the work presented in <strong>IMEXnet - Forward Stable Deep Neural Network</strong> <d-cite key="haber2019imexnet"></d-cite> (Published in ICML 2019). In this work, Authors talk about the forward stability of residual architectures and the problems that can arise with using explicit methods for the ordinary differential equation forms of ResNet. The authors also look closely at the <em>field of view</em> problem in terms of high-dimensional output problems (such as image-to-image methods like segmentation, depth-estimation, super-resolution etc.). For solving tasks that involve high-dimensional output, several layers of Residual blocks are often employed in the network architecture to model interactions between far away pixels. The authors introduce an architecture based on Implicit-Explicit methods for the ODE/PDE form of the Residual Networks which enhances the field of view with an improvement in stability of the network.</p> <blockquote> <p>In this post, I have discussed the concept of semi-implicit methods provided by the authors. For a detailed view and experimental analysis, head over to their paper <d-cite key="haber2019imexnet"></d-cite> and their <a href="https://github.com/HaberGroup/SemiImplicitDNNs">github repo</a></p> </blockquote> <h2 id="residual-method-as-ode">Residual Method as ODE</h2> <p>The $j^{th}$ layer of a Residual layer, updating the feature $Y_j$ can be written as:</p> <p>\begin{equation} Y_{j+1} = Y_{j} + h.f(Y_{j}, \theta_{j}) \tag{1} \label{eq:one} \end{equation}</p> <p>Where, $Y_{j+1}$ and $Y_j$ are outputs of layers $j+1$ and $j$ respectively. $\theta_j$ is the layer parameter, $f$ is a non-linear function, and $h$ is the step size (usually set to 1). In problems related to images, the function $f$ is usually a series of convolutions, normalisation and activations. In this particular work, $f$ is taken to be:</p> <p>\begin{equation} f(Y, K_1, K_2, \alpha, \beta) = K_2 \sigma (N_{\alpha,\beta} (K_1 Y)) \tag{2} \label{eq:two} \end{equation}</p> <p>Here $K_1$ and $K_2$ are taken to be 3x3 convolutional kernels, $N_{\alpha,\beta}$ is the normalization layer and $\sigma$ is the non-linear activation function. This structure was taken from <d-cite key="he2016deep"></d-cite>. The function in the above equation can be used to see that the operation on a small 5x5 patch will be used to evaluate the output pixel information, thus making it necessary to use a number of such blocks to have a wider field of view over the input image.</p> <h3 id="forward-euler-form">Forward Euler Form</h3> <p>In lieu of the step function described in \eqref{eq:one} (the discretized form), the <a href="http://web.mit.edu/10.001/Web/Course_Notes/Differential_Equations_Notes/node3.html">forward euler formulation</a> of the ODE is written as:</p> <p>\begin{equation} \dot Y(t) = f(Y(t), \theta(t))<br/> Y(0) = Y_0 \tag{3} \label{eq:three} \end{equation}</p> <p>The features $Y(t)$ and the weights $\theta(t)$ are taken to be continuous functions in time, where $t$ corresponds to the depth of the network. Previously, explicit methods (such as <a href="https://en.wikipedia.org/wiki/Midpoint_method">mid-point method</a>, <a href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods">Runge Kutta method</a>) have been utilised to solve such equations, they often suffer from a lack of stability. <a href="https://en.wikipedia.org/wiki/Explicit_and_implicit_methods">Explicit methods</a> are of the form where the information in $Y_{t+1}$ is described as a functoin of the previous state $Y_t$. Using some iterative methods (as mentioned in examples above), many small steps are usually needed to integrate the PDE over a long amount of time.</p> <p>As mentioned in the paper, one way to improve the flow of information in the network modelled after ODEs is to make use of implicit methods, <em>i.e.</em> express the state $Y_{t+1}$ in terms of the same time-step $Y_{t+1}$ implicitly.</p> <h2 id="semi-implicit-form-and-its-stability">Semi-Implicit Form and It’s Stability</h2> <p>One of the simplest forms for implicit functions, quite similar to forward euler equation is the <a href="http://web.mit.edu/10.001/Web/Course_Notes/Differential_Equations_Notes/node3.html">backward euler method</a> in the non-linear discretized form:</p> <p>\begin{equation} Y_{j+1} - Y_{j} = h . f(Y_{j+1}, \theta_{j+1}) \tag{4} \label{eq:four} \end{equation}</p> <p>This method is stable for any choice of $h$ when the eigenvalue of the jacobian of $f$ have no positive real part (See <a href="http://www.scholarpedia.org/article/Equilibrium">This article</a> for more details on stability of methods w.r.t to second-order differential equations). If the given condition is satisfied, $h$ can be chosen large enough to simulate large step-size in the continuous form while being robust to small perturbations in the input information.</p> <p>Turns out, implicit methods are rather expensive to compute. Especially the above mentioned equation \eqref{eq:four} is a non-linear problem which can be computationally expensive to solve. So rather than using a full implicit or explicit method, the authors derived a combination in the form of a implicit-explicit (IMEX) or semi-implicit method.</p> <p>They key idea in IMEX methods is to divide the right-hand side of the ODE into two parts: A non-linear explicit form and a linear implicit form. The equation in IMEXnet is designed in such a way that it can be solved efficiently. The equation in \eqref{eq:three} will now be reformatted as:</p> <p>\begin{equation} \dot Y(t) = f(Y(t), \theta(t)) + LY(t) - LY(t) \tag{5} \label{eq:five} \end{equation}</p> <p>where, The first part $f(Y(t), \theta(t)) + LY(t)$ is treated explicitly, while the second part $LY(t)$ is treated implicitly.<br/> The matrix $L$ is chosen freely with the property of being easily invertible. A fair choice of $L$ can be modelled after a 3x3 convolution operation with symmetric positive-definite property, which makes it easy to invert (more on that later). The continuous equation can now be simplified as the following:</p> \[Y_{j+1} - hLY_{j+1} = Y_j + hf(Y_j, \theta_j) + hLY_j\] <p>which can be simplified as:</p> <p>\begin{equation} Y_{j+1} = (I - hL)^{-1} (Y_j + hLY_j + hf(Y_j, \theta_j)) \tag{6} \label{eq:six} \end{equation}</p> <p>with $I$ being the identity matrix.<br/> In the above equation, the authors have shown that the forward part (while seemingly complex) is rather easy to compute and similar to that of a convolution. Furthermore, the authors claim that the network is always stable for a suitable choice of $L$, while having some favourable properties of implicit methods. The matrix $(I + hL)^{-1}$ is dense in nature, which avoids the field of view problem by using all pixels of the image in it’s computational step.</p> <p>The authors choose $L$ to be a laplacian matrix with a group convolution operator (group conv. was also used in AlexNet! <d-cite key="krizhevsky2012imagenet"></d-cite>. The weights of the matrix are taken as the following:</p> \[\begin{equation} L = \frac{1}{6} \begin{bmatrix} -1 &amp; -4 &amp; -1 &amp;\\ -4 &amp; 20 &amp; -4\\ -1 &amp; -4 &amp; -1 \end{bmatrix} \tag{7} \end{equation}\] <p>Before going into the discussion about the choice of $L$ and the stability of the method, a quick recap of the Laplace transform is due.</p> <blockquote> The [Laplace transform](https://en.wikipedia.org/wiki/Laplace_transform) (taken from wikipedia), converts a function of real variable $t$ to a function of a complex variable $s$. The laplace transform for $f(t); t \ge 0$ is the function $F(s)$ which is a unilteral transform defined by: $$ F(s) = \int_{0}^{\infty} f(t) e^{-st} dt $$ And, for a laplacian matrix, $L$ is defined as, $L = D - A$ for a graph $G$, where $A$ is the adjacency matrix and $D$ is the degree matrix of the graph $G$. </blockquote> <p>Now, on the stability of the method, the authors provide a wonderful example of a simplified setting with a model problem (as given below) and provide the reasoning for the aforementioned choice of $L$.</p> <p>\begin{equation} \dot Y(t) = \lambda Y(t)<br/> Y(t) = Y_0 \tag{8} \end{equation}</p> <p>And take $L = \alpha I$, where we choose $\alpha \ge 0$. (Refer to the paper for a complete proof). Based on the analysis, the authors choose $K_1 = -K_{2}^{\intercal}$ in the equation \eqref{eq:two} as discussed properly in <d-cite key="ruthotto2019deep"></d-cite>, and also impose bound constraints on the convolution weights to achieve a bound on the term of $\lambda$, hence improving the stability of the model.</p> <p>An example of the field of view is shown here for IMEXnet.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/IMEX/FOV.png" alt="FOV"/> </div> </div> <p><br/></p> <h3 id="the-forward-pass">The Forward Pass</h3> <p>The authors show that using already available and widely used tools such as auto-differentiation and the fast fourier transform (<a href="https://en.wikipedia.org/wiki/Fast_Fourier_transform">FFT</a>), an efficient way for computing the linear system given below can be found.</p> \[(I + hL)Y = B\] <p>where, $L$ is constructed like a group-wise convolution as mentioned earlier and $B$ collects the explicit term.</p> <p>For efficient solution to the system, authors make use of the <a href="https://en.wikipedia.org/wiki/Convolution_theorem">convolution theorem</a> in the fourier space. The theorem says, for a convolution operation between a kernel $A$ and features $Y$, the convolutional operation can be computed as:</p> <p>\begin{equation} A * Y = F^{-1}((FA) \odot (FY)) \tag{9} \label{eq:nine} \end{equation}</p> <p>Where, $F$ is the Fourier transform, $*$ is the convolution operator, and $\odot$ is the hadamard-product (element-wise multiplication). Here, we assume a <strong>periodic boundary</strong> on the image data (discussed in detail next). This implies that if we need to compute the product of inverse of the convolutional operator $A$, we can simply element-wise divide by the inverse fourier transform of $A$:</p> \[A^{-1} * Y = F^{-1}((FY) \oslash (FA))\] <p>In our case, the kernel $A$ is associated with the matrix $I + hL$, which is invertible. For example, when we choose $L$ to be positive semi-definite, we define:</p> \[L = B^{\intercal} B\] <p>Where, $B$ is a trainable group-convolution operator. Using Fourier methods, we need to have the convolutional kernel at the same size as the image we convolve it with. This is done by generating a zero-matrix as the same size as that of the image and inserting entries of the kernel at appropriate places.</p> <blockquote> For a more thorough explaination about how to construct this kernel for fourier method, refer to the book <d-cite key="hansen2006deblurring"></d-cite>. The periodic boundary condition and the positive semi-definite property of the kernel are important here to derive the final convolution kernel $A$ for fourier transform and it’s spectral decomposition. Specifically, in chapters 3 and 4 of the book, it is given in detail about how to form the convolution kernel (or toeplitz matrix) for the __BCCB (Boundary Circulant with Circulat Blocks)__ type matrix. All BCCB matrices are normal in nature, i.e. $A^{*} A = A A^{*}$. So, a basic outline to compute the equation \eqref{eq:nine} is: <ol> <li>Compute the center of the kernel (after zero padding to match the size)</li> <li>Apply the corresponding circular shift over the kernel with the center.</li> <li>Compute the fourier transform of the update kernel and the image.</li> <li>Take the inverse fourier transform of the product.</li> </ol> Refer to <d-cite key="hansen2006deblurring"></d-cite> for a detailed information about the process, and [convolution theorem](https://en.wikipedia.org/wiki/Convolution_theorem)&lt;/a&gt; for a proof of the equation \eqref{eq:nine}. </blockquote> <p>The method is wonderfully captured by the authors with the help of a PyTorch pseudo-code as following:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/IMEX/algo.png" alt="Algorithm"/> </div> </div> <p><br/></p> <h3 id="computational-complexity">Computational Complexity</h3> <p>For a single block ResNet, with m channels and input image of size sxs, the forward pass takes approximately $\mathcal{O}(m^2 s^2)$ operations and $\mathcal{O}(m^2)$ memory.</p> <p>For the IMEX network, the explicit is pretty much the same followed by the implicit step. The Implicit step is a group-wise convolutional operation and requires $\mathcal{O}(m(s.log(s))^2)$ additional operations. The $s.log(s)$ term results from the application of the fourier transform. Since $log(s)$ is typically much smaller than $m$, the additional cost can be considered insignificant.</p> <h2 id="final-notes">Final Notes</h2> <p>As for the effectiveness of the network, the authors provide some compelling results on problems such as segmentation on synthetic Q-tip images as a toy example, and depth-estimation over kitchen images from the NYU Depth V2 <d-cite key="silberman2012indoor"></d-cite> dataset. One example as taken from the paper is shown below:</p> <p>First example from the Qtip segmentation:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/IMEX/qtip.png" alt="Qtip"/> </div> </div> <p><br/></p> <p>And an example from the depth estimation for kitchen images taken from the NYU Depth V2 dataset <d-cite key="silberman2012indoor"></d-cite>.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/IMEX/nyu_depth.png" alt="NYU depth"/> </div> </div> <p><br/></p> <p>The authors also make note of further possibilities for choosing other models with similar implicit properties. They epecially make note of a variant that can be used (called the diffusion-reaction problem):</p> \[\dot Y(t) = f(Y(t), \theta(t)) - LY(t)\] <p>Such equations can have interesting behaviour like forming non-linear wave patterns etc. These systems have been already studied in rigourous details as mentioned in the paper.</p> <p>Some further work over this appproach is also discussed in the paper: <strong>Robust Learning with Implicit Residual Networks</strong> <d-cite key="reshniak2021robust"></d-cite>, but that is beyond the scope of this post for now.</p> <blockquote> NOTE: I have written this post as per my understanding of the paper, and for my learning. I have tried to summarize (mostly just copy) the paper to the best of my capability in a short duration. Any constructive reviews are welcome. </blockquote> <p>–</p>]]></content><author><name>Shubham Dokania</name></author><summary type="html"><![CDATA[Semi-Implicit Networks]]></summary></entry><entry><title type="html">Principal Component Analysis</title><link href="https://shubham1810.github.io/blog/2018/pca/" rel="alternate" type="text/html" title="Principal Component Analysis"/><published>2018-06-28T09:00:00+00:00</published><updated>2018-06-28T09:00:00+00:00</updated><id>https://shubham1810.github.io/blog/2018/pca</id><content type="html" xml:base="https://shubham1810.github.io/blog/2018/pca/"><![CDATA[<p>As we work with real world data, we notice that the complexity increases; both in terms of dependency of variables on each other and dimensionality (number of variables) of the problem. Several techniques exist for analysis of such information and to make it easier to extract important properties for the purpose of better computation and visualization. One such method is the <strong>Principal Component Analysis (PCA)</strong>, which emphasises on the variance of the data to extract the directions which maximize the data variation.</p> <p>One of the major applications of PCA is dimensionality reduction, which is attained by choosing the transformed variables (obtained from projection of original variables on the direction of maximum variances, or the <em>principal components</em>).</p> <p>Few of the prerequisites for understanding PCA are: <a href="https://en.wikipedia.org/wiki/Covariance_matrix"><em>Covariance</em></a>, <a href="https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors"><em>Eigenvectors</em></a>, and <a href="https://en.wikipedia.org/wiki/Singular-value_decomposition"><em>Singular Value Decomposition</em></a>.</p> <blockquote> <p>Note: Some resources to read about the aforementioned topics:</p> <ol> <li>Eigenvalues &amp; Eigenvectors: <a href="http://setosa.io/ev/eigenvectors-and-eigenvalues/">Setosa visualization</a>, <a href="https://www.youtube.com/watch?v=PFDu9oVAE-g&amp;t=0s&amp;list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&amp;index=15">3Blue1Brown</a></li> <li>SVD: <a href="https://medium.com/the-andela-way/foundations-of-machine-learning-singular-value-decomposition-svd-162ac796c27d">This nice Medium blogpost</a></li> </ol> </blockquote> <p>For example, take some data (Say, \(X\)) with zero mean (if mean is not zero then subtract all values \(x_i\) with the mean, \(\mu\)). The covariance of this data (Say \(C_X\)) is given by:</p> \[C_X = \frac{1}{n}\cdot X\cdot X^T\] <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/data.png"/> </div> </div> <p><br/></p> <p>We want to figure out a transformation function \(W\) and apply on the data \(X\) so that in the resulting data \(Y\), the variables will be independent of each other. In simple terms, the covariance between any two distinct columns of \(Y\) will be zero, i.e. the non-diagonal elements of the covariance matrix \(C_Y\) of \(Y\) will be zero. This implies that \(C_Y\) will be a diagonal matrix.</p> <p>Writing the transformation from \(X\) to \(Y\), we have:</p> \[Y = X\cdot W\] <p>To solve for the covariance matrix of Y, we can write</p> \[C_Y = \frac{1}{n}\cdot Y\cdot Y^T\] <p>and since, \(Y = W\cdot X\), we have,</p> \[C_Y = \frac{1}{n}\cdot W\cdot X\cdot (W\cdot X)^T\\ C_Y = \frac{1}{n}\cdot W\cdot X\cdot X^T\cdot W^T\\ C_Y = W\cdot (\frac{1}{n}\cdot X\cdot X^T)\cdot W^T\\ C_Y = W\cdot C_X\cdot W^T\] <p>or,</p> \[C_X = W^T\cdot C_Y\cdot W\] <p>We know that, \(C_Y\) is supposed to be a diagonal matrix. What does this equation remind us of? <em>but of course</em>, the Singular Value Decomposition (SVD). Thus, If we take \(W\) as the matrix of the eigenvectors and \(C_Y\) as the diagonal matrix of the eigenvalues, the above equation will hold true, making the matrix \(W\), of eigenvectors of covariance of \(X\), our transformation matrix.</p> <p>Computing the above values for our data, and plotting the directions of the obtained eigenvalues, we get the following:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/eig.png"/> </div> </div> <p><br/></p> <p>As can be seen clearly, one of the eigenvectors falls along the direction of maximum variance of the data. On transforming the data \(X\) into \(Y\), and plotting again, we get:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/trans.png"/> </div> </div> <p><br/></p> <p>Printing the covariance of the new data \(Y\), we can see it’s a diagonal matrix. Also, the equation \(W\cdot C_Y\cdot W^T\) returns the original covariance matrix \(C_X\).</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/sig.png"/> </div> </div> <p><br/></p> <h3 id="dimension-reduction">Dimension Reduction</h3> <p>One of the major applications of PCA is it’s ability to choose the dimensions of maximum variation, i.e. taking the projection of the data along those components only will not affect the complexity of the data by a significant amount and data can be reconstructed back to an approximation of it’s original form with the lower dimensional data as well.</p> <p>On paying more attention to the covariance matrix \(C_Y\), we see that the magnitude of the eigenvalues along the diagonal of the matrix is related to the amount of variances explained by the said eigenvector direction.</p> <p>So, sorting the eigenvalues and corresponding eigenvector pairs in decreasing order and taking only the top values becomes the ideal way of choosing the eigenvectors for obtaining maximum explained variances.</p> <p>For further demonstration, let’s use another dataset (<a href="http://yann.lecun.com/exdb/mnist/">MNIST</a>) for PCA.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/mnist.png"/> </div> </div> <p><br/></p> <p>Computing the eigenvectors and eigenvalues for the above dataset and sorting them on the basis of eigenvalues (descending order), we can store them back in numpy arrays.</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/mnist_eig.png"/> </div> </div> <p><br/></p> <p>And plot the eigenvalues, and the cumulative sum of the eigenvalues (<strong>Explained Variances</strong>).</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/exp_var.png"/> </div> </div> <p><br/></p> <p>From the above curve for the cumulative sum, denoting the explained variances of the original data, we can conclude that approximate 150 dimensions shall be enough to get ~95% of the variances of the original dataset, and about 326 dimensions out of 784 for ~99%.</p> <p>To reduce the number of dimensions, we have to select the number of dimensions we want \(k\) and use only those \(k\) columns from \(W\) to form the transformation matrix (Say \(W'\)). Thus the transformation and reconstruction operation become:</p> \[Y_{m \times k} = X_{m \times n} \cdot W'_{n \times k}\\ \\ X'_{m \times n} = Y_{m \times k} \cdot W'^T_{k \times n}\] <p>Let’s now pick only 2 dimensions (~23% explained variance), and plot the points as a scatter plot, and color based on the class label from the training set. Let’s use scikit-learn package for this last operation:</p> <div class="row mt-3"> <div class="col-sm mt-3 mt-md-0"> <img class="img-fluid rounded z-depth-1" src="/assets/img/PCA/pca_data.png"/> </div> </div> <p><br/></p> <p>From the scatter plot, we can do some simple analysis and see some relationship between the color of points (labels) and their location on the plot. For instance, the green cluster (representing the label <strong>1</strong>) is formed clearly distinct from others, while the clusters for colors brown and pink (for digits <strong>4</strong> and <strong>9</strong>) are somewhat in the same region, etc.</p> <p>Although the explained variance with 2 dimensions was roughly 23%, we still can derive some meaningful information about the data. Having more number of dimensions will make it easier to process and analyse the data as compared to the original data distribution.</p> <p>Also, applying PCA would make it easier to use the data in models such as the <a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes</a>, where the core assumption is that the columns are independent of each other.</p> <blockquote> <p>Note: If we want to keep the physical meaning of the columns in the dataset intact, using PCA would be a bad idea since the transformed columns are linear combinations of the original columns. Hence, the new columns would lose their original meaning.</p> <p>Also, dimension reduction is useful only if the eigenvalues vary significantly for any data distribution. For eigenvalues in similar ranges, each column will have similar contribution towards the variation in data, hence removing them would cause greater loss.</p> </blockquote> <p>–</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Principal Component Analysis]]></summary></entry><entry><title type="html">Object Detection with R-CNN Family</title><link href="https://shubham1810.github.io/blog/2017/objectdetection/" rel="alternate" type="text/html" title="Object Detection with R-CNN Family"/><published>2017-12-05T09:00:00+00:00</published><updated>2017-12-05T09:00:00+00:00</updated><id>https://shubham1810.github.io/blog/2017/objectdetection</id><content type="html" xml:base="https://shubham1810.github.io/blog/2017/objectdetection/"><![CDATA[<p>Convolution Neural Networks (CNNs) are widely used, majorly for the purpose of image classification (classifying an object in an image into one of the given categories) and have shown to perform very well on huge datasets (for example, the ImageNet challenge [link]). Even with the huge success <replace this=""> of CNNs in classification, the task of actually understanding an image still remains a challenge. One such task that corresponds to image understanding is object detection, wherein the task is to detect objects in an image and specify where these objects appear in the image (using a bounding box or masking etc.).</replace></p> <p>Several algorithms have been proposed to solve the task of object detection, and one such class of methods to be discussed in this post is the R-CNN family of algorithms (R-CNN [], fast R-CNN [], faster R-CNN [], Mask R-CNN []).</p> <h2 id="r-cnn">R-CNN</h2> <p>R-CNN, or <strong>Regions with CNN features</strong>, is a method for object detection proposed in 2014</p> <p>–</p>]]></content><author><name></name></author><category term="blog"/><category term="rcnn"/><summary type="html"><![CDATA[Convolution Neural Networks (CNNs) are widely used, majorly for the purpose of image classification (classifying an object in an image into one of the given categories) and have shown to perform very well on huge datasets (for example, the ImageNet challenge [link]). Even with the huge success of CNNs in classification, the task of actually understanding an image still remains a challenge. One such task that corresponds to image understanding is object detection, wherein the task is to detect objects in an image and specify where these objects appear in the image (using a bounding box or masking etc.).]]></summary></entry><entry><title type="html">Markov Chains</title><link href="https://shubham1810.github.io/blog/2017/mc/" rel="alternate" type="text/html" title="Markov Chains"/><published>2017-10-14T09:00:00+00:00</published><updated>2017-10-14T09:00:00+00:00</updated><id>https://shubham1810.github.io/blog/2017/mc</id><content type="html" xml:base="https://shubham1810.github.io/blog/2017/mc/"><![CDATA[<p>Markov chains are memoryless mathematical process (or a sequence) which jump from one state to another, following the rules of the <a href="">Markov property</a>. A <strong>state</strong> can be thought of as a situation/event or a set of values. One example to demonstrate a markov chain can be weather conditions; <em>Sunny</em> and <em>Rainy</em> being two weather conditions (states), one such sample of a sequence of events can be as follows:</p> <figure class="highlight"><pre><code class="language-css" data-lang="css"><span class="nt">Rainy</span> <span class="nt">Sunny</span> <span class="nt">Rainy</span> <span class="nt">Rainy</span> <span class="nt">Sunny</span> <span class="nt">Sunny</span> <span class="nt">Sunny</span> <span class="nt">Sunny</span> <span class="nt">Rainy</span><span class="o">...</span></code></pre></figure> <p>The Markov chain follows the shifts or <em>transitions</em> based on a Transition Probability Matrix, \(T\), which contain information about how probable it is to visit state \(j\) when the current state is \(i\), for all possible states of the system (called the state space, \(S\)). The Markov property states that the conditional probability distribution of the future states depends only on the present state, not the sequence of previous states. Mathematically, assume \(X\) is a sequence of states \(x_i \in S\), then \(X = x_n, x_{n-1}, ..., x_0\) is a Markov sequence iff:</p> \[\mathbb{P}(X_n = x_n | X_{n-1} = x_{n-1}, ..., X_0 = x_0) = \mathbb{P}(X_n = x_n | X_{n-1} = x_{n-1})\] <p>Where the probability of transition is taken from \(T\), i.e.</p> \[\mathbb{P}(X_n = j | X_{n-1} = i) = T_{ij}; T = \begin{bmatrix} p_{11} &amp; p_{12} &amp; p_{13} &amp; \dots &amp; p_{1m} \\ p_{21} &amp; p_{22} &amp; p_{23} &amp; \dots &amp; p_{2m} \\ \vdots &amp; \vdots &amp; \vdots &amp; \ddots &amp; \vdots \\ p_{m1} &amp; p_{m2} &amp; p_{m3} &amp; \dots &amp; p_{mm} \end{bmatrix}\] <p>It is because of the markov property that the markov chain is called a memoryless process since there is no requirement to store the past states in the memory. The system jumps from one state to another following the probability distribution given by the transition probability matrix \(T\). An excellent interactive example of a markov chain can be found <a href="http://setosa.io/markov">here</a>.</p> <p>Also, since markov chains predict the probability of going from a state \(i\) to state \(j\) (\(i, j \in S\)) in one step, they can also be used to predict the probability of going from state \(i\) to state \(j\) in some \(k\) number of steps. The probability of going from \(i\) to \(j\) in 2 steps (reaching an intermediate state \(p\) in between) \(i \to p \to j\) is:</p> \[\mathbb{P}(X_n = j | X_{n-1} = p) . \mathbb{P}(X_{n-1} = p | X_{n-2} = i) = T_{ip} . T_{pj}\] <p>which is essentially the element at position (\(i, j\)) in a matrix \(A = T^2\). In general, this probability for \(k\) steps can be computed from \(T_{ij}^{k}\).</p> <p>Few popular applications of Markov chains include Google PageRank, Autocomplete/typing word prediction, Generating sequences of text (for sentences) or pixels (for images) etc.</p> <p>–</p> <p>TODO: Add code + example for text generation using markov chains.</p> <p>–</p>]]></content><author><name></name></author><category term="blog"/><category term="markov-chains"/><summary type="html"><![CDATA[Markov chains are memoryless mathematical process (or a sequence) which jump from one state to another, following the rules of the Markov property. A state can be thought of as a situation/event or a set of values. One example to demonstrate a markov chain can be weather conditions; Sunny and Rainy being two weather conditions (states), one such sample of a sequence of events can be as follows:]]></summary></entry><entry><title type="html">Evolutionary Algorithms I: Differential Evolution</title><link href="https://shubham1810.github.io/blog/2017/de/" rel="alternate" type="text/html" title="Evolutionary Algorithms I: Differential Evolution"/><published>2017-06-15T09:10:00+00:00</published><updated>2017-06-15T09:10:00+00:00</updated><id>https://shubham1810.github.io/blog/2017/de</id><content type="html" xml:base="https://shubham1810.github.io/blog/2017/de/"><![CDATA[<p>Evolutionary Algorithms are classified under a family of algorithms for global optimization by biological evolution, and are based on meta-heuristic search approaches. The possible solutions usually span a n-dimensional vector space over the problem domain and we simulate several population particles to reach a global optimum.</p> <p>An optimization problem, in a basic form, consists of solving the task of maximizing or minimizing a real function by choosing values from a pool of possible solution elements (vectors) according to procedural instructions provided for the algorithm. Evolutionary approaches usually follow a specific strategy with differenet variations to select candidate elements from population set and apply crossover and/or mutations to modify the elements while trying to improve the quality of modified elements.</p> <p>These algorithms can be applied to several interesting applications as well, and have been shown to perform very well in optimizing NP-hard problems as well, including the Travelling Salesman Problem, Job-Shop Scheduling, Graph coloring while also having applicaitons in domains such as Signals and Systems, Mechanical Engineering, and solving mathematical optimization problems.</p> <p>One such algorithm belonging to the family of Evolutionary Algorithms is Differential Evolution (DE) algorithm. In this post, we shall be discussing about a few properties of the Diferential Evolution algorithm while implementing it in Python (github link) for optimizing a few test functions.</p> <h2 id="differential-evolution">Differential Evolution</h2> <p>DE approaches an optimization problem iteratively trying to improve a set of candidate solutions for a given measure of quality (cost function). These set of algorithms fall under meta-heuristics since they make few or no assumptions about the problem being optimized and can search very large spaces of possible solution elements. The algorithm involves maintaining a population of candidate solutions subjected to iterations of recombination, evaluation and selection. The creation of new candidate solution requires the application of a linear operation on selected elements using a parameter \(F\) called differential weight from population to generate a vector element and then randomly applying crossover based on the parameter Crossover Probability. \(CR\).</p> <p>The algorithm follows the steps listed down:</p> <ol> <li>Initialize a set of agents/elements \(x\) with random positions in the search space for population size \(P\).</li> <li>Until a termination criterion is met (number of iterations or required optimality), repeat the following for each agent \(x_i\): <ul> <li>Pick three agents \(a, b\), and \(c\) from the population at random (distnct).</li> <li>Pick a random index \(R \in \{1,...,n\}\) (\(n\) is the dimensionality of the problem)</li> <li> <p>Compute a temporary vector \(y\) as following:</p> \[y = a + F (b-c)\] </li> <li>Now, for each \(j \in \{1,...,n\}\), pick a uniformly distributed number \(r_i \equiv U(0, 1)\).</li> <li>If \(r_i \lt CR\) or \(i=R\), then <ul> <li>set \(x_{I, j} = y_{j}\)</li> </ul> </li> <li> <p>Otherwise, \(x_{I, j} = x_{i, j}\)</p> </li> <li>if \(f(x_{I}) \lt f(x_i)\), (\(f\) is the cost function for minimization), then <ul> <li>replace \(x_i\) with \(x_i\).</li> </ul> </li> <li>otherwise, \(x_i\) remains unchanged.</li> </ul> </li> <li>Pick the agent from the population that has the highest fitness or lowest cost function value as the solution.</li> </ol> <h2 id="implementing-the-algorithm">Implementing the Algorithm</h2> <p>The directory structure for the code follows the design as given below:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="p">.</span>
<span class="err">├──</span> <span class="n">differential_evolution</span><span class="p">.</span><span class="n">py</span>
<span class="err">└──</span> <span class="n">helpers</span>
    <span class="err">├──</span> <span class="n">__init__</span><span class="p">.</span><span class="n">py</span>
    <span class="err">├──</span> <span class="n">point</span><span class="p">.</span><span class="n">py</span>
    <span class="err">├──</span> <span class="n">population</span><span class="p">.</span><span class="n">py</span>
    <span class="err">└──</span> <span class="n">test_functions</span><span class="p">.</span><span class="n">py</span></code></pre></figure> <p>Where, <em>differential_evolution.py</em> is the main file we’ll run for execution of the algorithm. The helpers directory consists of helper classes and functions for several operations such as handling the point objects and vector operations related to candidate elements (<em>point.py</em>), methods for handling the collection of all such points and building the population (<em>collection.py</em>), test functions to be used objective/cost functions for testing the efficiency of the algorithm (<em>test_functions.py</em>).</p> <h3 id="building-the-point-class">Building The Point Class</h3> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># helpers/point.py
</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="n">scipy</span> <span class="k">as</span> <span class="n">sp</span>


<span class="k">class</span> <span class="nc">Point</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">upper_limit</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">lower_limit</span><span class="o">=-</span><span class="mi">10</span><span class="p">,</span> <span class="n">objective</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">dim</span> <span class="o">=</span> <span class="n">dim</span>
        <span class="n">self</span><span class="p">.</span><span class="n">coords</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">((</span><span class="n">self</span><span class="p">.</span><span class="n">dim</span><span class="p">,))</span>
        <span class="n">self</span><span class="p">.</span><span class="n">z</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="n">self</span><span class="p">.</span><span class="n">range_upper_limit</span> <span class="o">=</span> <span class="n">upper_limit</span>
        <span class="n">self</span><span class="p">.</span><span class="n">range_lower_limit</span> <span class="o">=</span> <span class="n">lower_limit</span>
        <span class="n">self</span><span class="p">.</span><span class="n">objective</span> <span class="o">=</span> <span class="n">objective</span>
        <span class="n">self</span><span class="p">.</span><span class="nf">evaluate_point</span><span class="p">()</span>

    <span class="k">def</span> <span class="nf">generate_random_point</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">coords</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">uniform</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">range_lower_limit</span><span class="p">,</span> <span class="n">self</span><span class="p">.</span><span class="n">range_upper_limit</span><span class="p">,</span> <span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">dim</span><span class="p">,))</span>
        <span class="n">self</span><span class="p">.</span><span class="nf">evaluate_point</span><span class="p">()</span>

    <span class="k">def</span> <span class="nf">evaluate_point</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="c1"># self.z = evaluate(self.coords)
</span>        <span class="n">self</span><span class="p">.</span><span class="n">z</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">objective</span><span class="p">.</span><span class="nf">evaluate</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">coords</span><span class="p">)</span></code></pre></figure> <p>Here, we’re initializing the Point class with <em>dim</em> which is the dimension size of the vector, <em>lower_limit</em> and <em>upper_limit</em> specify the domain of each co-ordinate of the vector. <em>self.z</em> is the objective function value of the point, associated with each instance to make it wasy for ranking them based on their objective function value. The <em>evaluate_point</em> function runs the objective function for the given point on the test function. The <em>Point</em> class creates instance of vector objects signifying each individual in the population. The collection of individuals is defined in the <em>Population</em> class.</p> <h3 id="the-population-class">The Population Class</h3> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># helpers/population.py
</span>
<span class="kn">import</span> <span class="n">copy</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="kn">from</span> <span class="n">point</span> <span class="kn">import</span> <span class="n">Point</span>
<span class="kn">from</span> <span class="n">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="k">class</span> <span class="nc">Population</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">num_points</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">upper_limit</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">lower_limit</span><span class="o">=-</span><span class="mi">10</span><span class="p">,</span> <span class="n">init_generate</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">objective</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="n">self</span><span class="p">.</span><span class="n">points</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="n">self</span><span class="p">.</span><span class="n">num_points</span> <span class="o">=</span> <span class="n">num_points</span>
        <span class="n">self</span><span class="p">.</span><span class="n">init_generate</span> <span class="o">=</span> <span class="n">init_generate</span>
        <span class="n">self</span><span class="p">.</span><span class="n">dim</span> <span class="o">=</span> <span class="n">dim</span>
        <span class="n">self</span><span class="p">.</span><span class="n">range_upper_limit</span> <span class="o">=</span> <span class="n">upper_limit</span>
        <span class="n">self</span><span class="p">.</span><span class="n">range_lower_limit</span> <span class="o">=</span> <span class="n">lower_limit</span>
        <span class="n">self</span><span class="p">.</span><span class="n">objective</span> <span class="o">=</span> <span class="n">objective</span>
        <span class="c1"># If initial generation parameter is true, then generate collection
</span>        <span class="k">if</span> <span class="n">self</span><span class="p">.</span><span class="n">init_generate</span> <span class="o">==</span> <span class="bp">True</span><span class="p">:</span>
            <span class="k">for</span> <span class="n">ix</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="n">num_points</span><span class="p">):</span>
                <span class="n">new_point</span> <span class="o">=</span> <span class="nc">Point</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="n">dim</span><span class="p">,</span> <span class="n">upper_limit</span><span class="o">=</span><span class="n">self</span><span class="p">.</span><span class="n">range_upper_limit</span><span class="p">,</span>
                                  <span class="n">lower_limit</span><span class="o">=</span><span class="n">self</span><span class="p">.</span><span class="n">range_lower_limit</span><span class="p">,</span> <span class="n">objective</span><span class="o">=</span><span class="n">self</span><span class="p">.</span><span class="n">objective</span><span class="p">)</span>
                <span class="n">new_point</span><span class="p">.</span><span class="nf">generate_random_point</span><span class="p">()</span>
                <span class="n">self</span><span class="p">.</span><span class="n">points</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">new_point</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">get_average_objective</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="n">avg</span> <span class="o">=</span> <span class="mf">0.0</span>

        <span class="k">for</span> <span class="n">px</span> <span class="ow">in</span> <span class="n">self</span><span class="p">.</span><span class="n">points</span><span class="p">:</span>
            <span class="n">avg</span> <span class="o">+=</span> <span class="n">px</span><span class="p">.</span><span class="n">z</span>
        <span class="n">avg</span> <span class="o">=</span> <span class="n">avg</span><span class="o">/</span><span class="nf">float</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">num_points</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">avg</span></code></pre></figure> <p>The <em>Population</em> class contain the set of point class instances acting a individuals in the population. The individuals are stored in <em>self.points</em> list. The parameters of the class are <em>num_points</em>, containing information about the population size, <em>dim</em>, <em>upper_limit</em> and <em>lower_limit</em> as discussed above. As an optional parameter, <em>init_generate</em> controls the generation of the initial population and <em>objective</em> referes to an object of the <em>Function</em> class and is the objective function (discussed in the next section). If set to <em>False</em>, the initial population will be empty and the elements will need to added through the main procedure of the algorithm. The <em>get_average_objectve</em> function returns the mean evaluated objective value of the population.</p> <h3 id="the-objective-functions">The Objective Functions</h3> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># helpers/test_functions.py
</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>


<span class="k">class</span> <span class="nc">Function</span><span class="p">:</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">func</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>

        <span class="n">self</span><span class="p">.</span><span class="n">objectives</span> <span class="o">=</span> <span class="p">{</span>
            <span class="sh">'</span><span class="s">sphere</span><span class="sh">'</span><span class="p">:</span> <span class="n">self</span><span class="p">.</span><span class="n">sphere</span><span class="p">,</span>
            <span class="sh">'</span><span class="s">ackley</span><span class="sh">'</span><span class="p">:</span> <span class="n">self</span><span class="p">.</span><span class="n">ackley</span><span class="p">,</span>
            <span class="sh">'</span><span class="s">rosenbrock</span><span class="sh">'</span><span class="p">:</span> <span class="n">self</span><span class="p">.</span><span class="n">rosenbrock</span><span class="p">,</span>
            <span class="sh">'</span><span class="s">rastrigin</span><span class="sh">'</span><span class="p">:</span> <span class="n">self</span><span class="p">.</span><span class="n">rastrigin</span><span class="p">,</span>
        <span class="p">}</span>
        
        <span class="k">if</span> <span class="n">func</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
            <span class="n">self</span><span class="p">.</span><span class="n">func_name</span> <span class="o">=</span> <span class="sh">'</span><span class="s">sphere</span><span class="sh">'</span>
            <span class="n">self</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">objectives</span><span class="p">[</span><span class="n">self</span><span class="p">.</span><span class="n">func_name</span><span class="p">]</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="k">if</span> <span class="nf">type</span><span class="p">(</span><span class="n">func</span><span class="p">)</span> <span class="o">==</span> <span class="nb">str</span><span class="p">:</span>
                <span class="n">self</span><span class="p">.</span><span class="n">func_name</span> <span class="o">=</span> <span class="n">func</span>
                <span class="n">self</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">objectives</span><span class="p">[</span><span class="n">self</span><span class="p">.</span><span class="n">func_name</span><span class="p">]</span>
            <span class="k">else</span><span class="p">:</span>
                <span class="n">self</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">func</span>
                <span class="n">self</span><span class="p">.</span><span class="n">func_name</span> <span class="o">=</span> <span class="n">func</span><span class="p">.</span><span class="n">func_name</span>

    <span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">point</span><span class="p">):</span>
        <span class="k">return</span> <span class="n">self</span><span class="p">.</span><span class="nf">func</span><span class="p">(</span><span class="n">point</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">sphere</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">d</span> <span class="o">=</span> <span class="n">x</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
        <span class="n">f</span> <span class="o">=</span> <span class="mf">0.0</span>

        <span class="k">for</span> <span class="n">dx</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="n">d</span><span class="p">):</span>
            <span class="n">f</span> <span class="o">+=</span> <span class="n">x</span><span class="p">[</span><span class="n">dx</span><span class="p">]</span> <span class="o">**</span> <span class="mi">2</span>
        
        <span class="k">return</span> <span class="n">f</span>

    <span class="k">def</span> <span class="nf">ackley</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">z1</span><span class="p">,</span> <span class="n">z2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span>

        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">)):</span>
            <span class="n">z1</span> <span class="o">+=</span> <span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">**</span> <span class="mi">2</span>
            <span class="n">z2</span> <span class="o">+=</span> <span class="n">np</span><span class="p">.</span><span class="nf">cos</span><span class="p">(</span><span class="mf">2.0</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">pi</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>

        <span class="nf">return </span><span class="p">(</span><span class="o">-</span><span class="mf">20.0</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">z1</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">))))</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">z2</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="n">e</span> <span class="o">+</span> <span class="mf">20.0</span>

    <span class="k">def</span> <span class="nf">rosenbrock</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">v</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">):</span>
            <span class="n">v</span> <span class="o">+=</span> <span class="mi">100</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span>

        <span class="k">return</span> <span class="n">v</span>

    <span class="k">def</span> <span class="nf">rastrigin</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">v</span> <span class="o">=</span> <span class="mi">0</span>

        <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">)):</span>
            <span class="n">v</span> <span class="o">+=</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="mi">10</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">cos</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">pi</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span>

        <span class="nf">return </span><span class="p">(</span><span class="mi">10</span> <span class="o">*</span> <span class="nf">len</span><span class="p">(</span><span class="n">x</span><span class="p">))</span> <span class="o">+</span> <span class="n">v</span></code></pre></figure> <p>The <em>test_functions.py</em> contains the implementation of the <em>Function</em> class, which creates an objecctive function object. The parameters to the constructor is <em>func</em> which can either be a string or a function. If <em>None</em>, it’ll store the function <em>sphere</em> in <em>self.func</em>, else it shall check for string value. For a string, it will assign the function with the same name implemented in the class (stored under the dictionary <em>self.objectives</em>). For a function, this assumes that the function accepts a numpy ndarray as an input and returns a scalar quantity as the objective function value.</p> <p>The Objective functions implemented by default currently include <em>sphere</em>, <em>ackley</em>, <em>rosenbrock</em>, and <em>rastrigin</em> functions. A list of optomization test functions can be found <a href="https://www.sfu.ca/~ssurjano/optimization.html">here</a>. These are all defined in a multi-dimmensional vector space and exhibit either unimodal or multi-modal properties. For example, the <em>sphere</em> function is a unimodal convex function, while the <em>rastrigin</em> function is a multi-modal non-convex function. The representation of the rastrigin function in a 3-D space is shown (the vertical axis is the value of the objective function):</p> <p><img src="https://upload.wikimedia.org/wikipedia/commons/8/8b/Rastrigin_function.png" alt="ras"/></p> <h3 id="the-differential-evolution-class">The Differential Evolution Class</h3> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># differential_evolution.py
</span>
<span class="kn">import</span> <span class="n">copy</span>
<span class="kn">import</span> <span class="n">random</span>
<span class="kn">import</span> <span class="n">time</span>

<span class="kn">from</span> <span class="n">helpers.population</span> <span class="kn">import</span> <span class="n">Population</span>
<span class="kn">from</span> <span class="n">helpers</span> <span class="kn">import</span> <span class="n">get_best_point</span>
<span class="kn">from</span> <span class="n">helpers.test_functions</span> <span class="kn">import</span> <span class="n">Function</span>


<span class="k">class</span> <span class="nc">DifferentialEvolution</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">num_iterations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">CR</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span> <span class="n">F</span><span class="o">=</span><span class="mf">0.48</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">population_size</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">print_status</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">func</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">()</span>
        <span class="n">self</span><span class="p">.</span><span class="n">print_status</span> <span class="o">=</span> <span class="n">print_status</span>
        <span class="n">self</span><span class="p">.</span><span class="n">num_iterations</span> <span class="o">=</span> <span class="n">num_iterations</span>
        <span class="n">self</span><span class="p">.</span><span class="n">iteration</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="n">self</span><span class="p">.</span><span class="n">CR</span> <span class="o">=</span> <span class="n">CR</span>
        <span class="n">self</span><span class="p">.</span><span class="n">F</span> <span class="o">=</span> <span class="n">F</span>
        <span class="n">self</span><span class="p">.</span><span class="n">population_size</span> <span class="o">=</span> <span class="n">population_size</span>
        <span class="n">self</span><span class="p">.</span><span class="n">func</span> <span class="o">=</span> <span class="nc">Function</span><span class="p">(</span><span class="n">func</span><span class="o">=</span><span class="n">func</span><span class="p">)</span>
        <span class="n">self</span><span class="p">.</span><span class="n">population</span> <span class="o">=</span> <span class="nc">Population</span><span class="p">(</span><span class="n">dim</span><span class="o">=</span><span class="n">dim</span><span class="p">,</span> <span class="n">num_points</span><span class="o">=</span><span class="n">self</span><span class="p">.</span><span class="n">population_size</span><span class="p">,</span> <span class="n">objective</span><span class="o">=</span><span class="n">self</span><span class="p">.</span><span class="n">func</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">iterate</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">ix</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">num_points</span><span class="p">):</span>
            <span class="n">x</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">[</span><span class="n">ix</span><span class="p">]</span>
            <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">sample</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
            <span class="k">while</span> <span class="n">x</span> <span class="o">==</span> <span class="n">a</span> <span class="ow">or</span> <span class="n">x</span> <span class="o">==</span> <span class="n">b</span> <span class="ow">or</span> <span class="n">x</span> <span class="o">==</span> <span class="n">c</span><span class="p">:</span>
                <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">]</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">sample</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>

            <span class="n">R</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">random</span><span class="p">()</span> <span class="o">*</span> <span class="n">x</span><span class="p">.</span><span class="n">dim</span>
            <span class="n">y</span> <span class="o">=</span> <span class="n">copy</span><span class="p">.</span><span class="nf">deepcopy</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>

            <span class="k">for</span> <span class="n">iy</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">dim</span><span class="p">):</span>
                <span class="n">ri</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">random</span><span class="p">()</span>

                <span class="k">if</span> <span class="n">ri</span> <span class="o">&lt;</span> <span class="n">self</span><span class="p">.</span><span class="n">CR</span> <span class="ow">or</span> <span class="n">iy</span> <span class="o">==</span> <span class="n">R</span><span class="p">:</span>
                    <span class="n">y</span><span class="p">.</span><span class="n">coords</span><span class="p">[</span><span class="n">iy</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span><span class="p">.</span><span class="n">coords</span><span class="p">[</span><span class="n">iy</span><span class="p">]</span> <span class="o">+</span> <span class="n">self</span><span class="p">.</span><span class="n">F</span> <span class="o">*</span> <span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">coords</span><span class="p">[</span><span class="n">iy</span><span class="p">]</span> <span class="o">-</span> <span class="n">c</span><span class="p">.</span><span class="n">coords</span><span class="p">[</span><span class="n">iy</span><span class="p">])</span>

            <span class="n">y</span><span class="p">.</span><span class="nf">evaluate_point</span><span class="p">()</span>
            <span class="k">if</span> <span class="n">y</span><span class="p">.</span><span class="n">z</span> <span class="o">&lt;</span> <span class="n">x</span><span class="p">.</span><span class="n">z</span><span class="p">:</span>
                <span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">[</span><span class="n">ix</span><span class="p">]</span> <span class="o">=</span> <span class="n">y</span>
        <span class="n">self</span><span class="p">.</span><span class="n">iteration</span> <span class="o">+=</span> <span class="mi">1</span>

    <span class="k">def</span> <span class="nf">simulate</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="n">pnt</span> <span class="o">=</span> <span class="nf">get_best_point</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Initial best value: </span><span class="sh">"</span> <span class="o">+</span> <span class="nf">str</span><span class="p">(</span><span class="n">pnt</span><span class="p">.</span><span class="n">z</span><span class="p">))</span>
        <span class="k">while</span> <span class="n">self</span><span class="p">.</span><span class="n">iteration</span> <span class="o">&lt;</span> <span class="n">self</span><span class="p">.</span><span class="n">num_iterations</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">self</span><span class="p">.</span><span class="n">print_status</span> <span class="o">==</span> <span class="bp">True</span> <span class="ow">and</span> <span class="n">self</span><span class="p">.</span><span class="n">iteration</span><span class="o">%</span><span class="mi">50</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
                <span class="n">pnt</span> <span class="o">=</span> <span class="nf">get_best_point</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">)</span>
                <span class="k">print</span> <span class="n">pnt</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="nf">get_average_objective</span><span class="p">()</span>
            <span class="n">self</span><span class="p">.</span><span class="nf">iterate</span><span class="p">()</span>

        <span class="n">pnt</span> <span class="o">=</span> <span class="nf">get_best_point</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">population</span><span class="p">.</span><span class="n">points</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Final best value: </span><span class="sh">"</span> <span class="o">+</span> <span class="nf">str</span><span class="p">(</span><span class="n">pnt</span><span class="p">.</span><span class="n">z</span><span class="p">))</span>
        <span class="k">return</span> <span class="n">pnt</span><span class="p">.</span><span class="n">z</span></code></pre></figure> <p>Here, in the <em>DifferentialEvolution</em> class, the initializing parameters are:</p> <ol> <li><em>num_iteration</em> controlling the number of generations/iterations the optimization loop runs. Acts as the stopping criterion.</li> <li><em>CR</em> and <em>F</em> are the Crossover Probability and the Differential Weight as defined in the algorithm.</li> <li><em>dim</em> is the number of dimensions of the individial vectors (Size of the vector space, \(x \in R^n\); \(x\) is an individual vector).</li> <li><em>population_size</em> is passed to the <em>Population</em> class and the population object is stored in <em>self.population</em>.</li> <li><em>print_status</em> is a boolean value used for verbosity (prints the best objective function value at each iteration).</li> <li><em>func</em> accepts either the function name or the actual function and is used to create the <em>self.func</em> object, which is an instance of the <em>Function</em> class.</li> <li><em>self.iteration</em> keeps tracck of the current iteration/generation.</li> </ol> <p>There are essentially two member functions, <em>self.iterate</em> and <em>self.simulate</em>. The <em>self.iterate</em> function runs oone iteration of the Differential Evolution procedure, by applying the transformation operation and crossover on each individual in the population, and the <em>self.simulate</em> function calls the iterate function until the stopping criteria is met, and then prints the best value for the objective function.</p> <h3 id="demo">Demo</h3> <p>Now that we have an implementation for all the required classes for the Differential Evolution algorithm, we can write a small script to test everything out and see the results.</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># demo.py
</span>
<span class="kn">from</span> <span class="n">differential_evolution</span> <span class="kn">import</span> <span class="n">DifferentialEvolution</span>
<span class="kn">import</span> <span class="n">datetime</span>

<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>

<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="sh">'</span><span class="s">__main__</span><span class="sh">'</span><span class="p">:</span>
    <span class="n">number_of_runs</span> <span class="o">=</span> <span class="mi">5</span>
    <span class="n">val</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="n">print_time</span> <span class="o">=</span> <span class="bp">True</span>

    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">xrange</span><span class="p">(</span><span class="n">number_of_runs</span><span class="p">):</span>
        <span class="n">start</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">.</span><span class="nf">now</span><span class="p">()</span>
        <span class="n">de</span> <span class="o">=</span> <span class="nc">DifferentialEvolution</span><span class="p">(</span><span class="n">num_iterations</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">dim</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">CR</span><span class="o">=</span><span class="mf">0.4</span><span class="p">,</span> <span class="n">F</span><span class="o">=</span><span class="mf">0.48</span><span class="p">,</span> <span class="n">population_size</span><span class="o">=</span><span class="mi">75</span><span class="p">,</span> <span class="n">print_status</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">func</span><span class="o">=</span><span class="sh">'</span><span class="s">sphere</span><span class="sh">'</span><span class="p">)</span>
        <span class="n">val</span> <span class="o">+=</span> <span class="n">de</span><span class="p">.</span><span class="nf">simulate</span><span class="p">()</span>
        <span class="k">if</span> <span class="n">print_time</span><span class="p">:</span>
            <span class="k">print</span> <span class="sh">"</span><span class="se">\n</span><span class="s">Time taken:</span><span class="sh">"</span><span class="p">,</span> <span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">.</span><span class="nf">now</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span>
    <span class="k">print</span> <span class="sh">'</span><span class="s">-</span><span class="sh">'</span><span class="o">*</span><span class="mi">80</span>
    <span class="k">print</span> <span class="sh">"</span><span class="se">\n</span><span class="s">Final average of all runs:</span><span class="sh">"</span><span class="p">,</span> <span class="n">val</span> <span class="o">/</span> <span class="n">number_of_runs</span></code></pre></figure> <p>This script initializes the variables <em>number_of_runs</em>, <em>val</em>, and <em>print_time</em>. <em>number_of_runs</em> is used to initiate several runs of the algorithm, and finally the average outcome of the optimized objective function is returned after those runs. <em>val</em> stores the optimized objective function value for each run and is later used to compute the average. <em>print_time</em> is a boolean which controls if the computation time should be printed for each run or not.</p> <p>The output for the above code, i.e. using the differential evolution algorithm to optimize the sphere test function, on 50 dimensions (50-D vector space), running for 200 iterations for each runs produces the following output:</p> <figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="c1"># Output
</span>
<span class="n">Initial</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">1285.50913073</span>
<span class="n">Final</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">0.0258755727525</span>

<span class="n">Time</span> <span class="n">taken</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">05.931056</span>
<span class="n">Initial</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">1218.54112743</span>
<span class="n">Final</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">0.0323126608382</span>

<span class="n">Time</span> <span class="n">taken</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">05.560921</span>
<span class="n">Initial</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">1253.1145944</span>
<span class="n">Final</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">0.0340955810298</span>

<span class="n">Time</span> <span class="n">taken</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">06.081233</span>
<span class="n">Initial</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">1298.5615981</span>
<span class="n">Final</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">0.0439433666035</span>

<span class="n">Time</span> <span class="n">taken</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">04.511034</span>
<span class="n">Initial</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">1228.13894559</span>
<span class="n">Final</span> <span class="n">best</span> <span class="n">value</span><span class="p">:</span> <span class="mf">0.0405344973595</span>

<span class="n">Time</span> <span class="n">taken</span><span class="p">:</span> <span class="mi">0</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">05.081286</span>
<span class="o">--------------------------------------------------------------------------------</span>

<span class="n">Final</span> <span class="n">average</span> <span class="n">of</span> <span class="nb">all</span> <span class="n">runs</span><span class="p">:</span> <span class="mf">0.0353523357167</span></code></pre></figure> <p>The plot for objective function value against the iterations for the sphere test function in 50D and the Rastrigin test function in 50D are shown below:</p> <p><img src="/images/EA/sphere_50d_DE.png" alt="res_sphere"/></p> <p><img src="/images/EA/rastrigin_50d_DE.png" alt="res_rastrigin"/></p> <p>The code is available in a github repository <a href="https://github.com/shubham1810/Evolutionary_Algorithms_blog_code">here</a>.</p> <p>–</p>]]></content><author><name></name></author><category term="blog"/><category term="ea"/><category term="de"/><summary type="html"><![CDATA[Evolutionary Algorithms are classified under a family of algorithms for global optimization by biological evolution, and are based on meta-heuristic search approaches. The possible solutions usually span a n-dimensional vector space over the problem domain and we simulate several population particles to reach a global optimum.]]></summary></entry></feed>