{"id":5203,"date":"2019-11-01T18:57:33","date_gmt":"2019-11-01T13:27:33","guid":{"rendered":"https:\/\/code4developers.com\/?p=5203"},"modified":"2019-11-01T18:57:33","modified_gmt":"2019-11-01T13:27:33","slug":"exploratory-data-analysis","status":"publish","type":"post","link":"https:\/\/code4developers.com\/exploratory-data-analysis\/","title":{"rendered":"Exploratory Data Analysis"},"content":{"rendered":"<p>Machine learning is an application of AI(Artificial Intelligence) that makes computers to learn themselves from given data without being explicitly programmed. Now days computers are much powerful that they can easily be trained with much amount of data with so much minimum time. As a data scientist it is also mandatory that one have to know how the data is varying, how the data is categorized and how distributed. With the help of Exploratory Data Analysis(EDA) we get conclusions about the data that human can observe with the help of graphs, charts and values.<!--more--><\/p>\n<h4 id=\"5f2e\" class=\"fe ff dc bk bj fg fh fi fj fk fl fm fn fo fp fq fr\"><span id=\"definition\">Definition<\/span><\/h4>\n<p id=\"a066\" class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd \">Exploratory Data Analysis refers to the critical process of performing initial investigations on data so as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help of summary statistics and graphical representations.<\/p>\n<h4 id=\"explanation-of-eda-with-sample-iris-dataset\" class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd \">Explanation of EDA with sample iris dataset:<\/h4>\n<p id=\"6a65\" class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\" data-selectable-paragraph=\"\">I am taking iris dataset as a sample dataset and performing EDA. Iris dataset contains four features:<\/p>\n<ol>\n<li style=\"list-style-type: none;\">\n<ol>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">sepal_length<\/li>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">sepal_width<\/li>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">petal_length<\/li>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">petal_width and 3 classes\n<ol>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">setosa<\/li>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">verginica<\/li>\n<li class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\">versicolor<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p class=\"eq er dc bk es b et fs ev ft ex fu ez fv fb fw fd\" data-selectable-paragraph=\"\">Please click\u00a0<a class=\"at cg fx fy fz ga\" href=\"https:\/\/en.wikipedia.org\/wiki\/Iris_flower_data_set\" target=\"_blank\" rel=\"noopener noreferrer\">here<\/a> to get more information about iris dataset.<\/p>\n<p><strong>First step is to import required libraries and than read data files.<\/strong><\/p>\n<pre class=\"lang:default decode:true\">import pandas as pd\r\nimport seaborn as sns\r\nimport matplotlib.pyplot as plt\r\nimport numpy as np\r\niris = pd.read_csv(\u201ciris.csv\u201d)<\/pre>\n<p>Now we have to show number of raws and columns in dataframe, shape provides that functionality. After that we have to figure out which columns that dataframes contains. dataframe.columns returns list of columns that dataframe contains.<\/p>\n<pre class=\"lang:default decode:true\">print(\u201cShape of dataframe is\u201d, iris.shape)\r\nprint(\u201cColumns of dataframe are\u201d , iris.columns)<\/pre>\n<div class=\"gq gr cp t u gs ak dv gt gu\"><\/div>\n<div><img  loading=\"lazy\"  decoding=\"async\"  data-attachment-id=\"5222\"  data-permalink=\"https:\/\/code4developers.com\/exploratory-data-analysis\/da-1\/\"  data-orig-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?fit=751%2C102&amp;ssl=1\"  data-orig-size=\"751,102\"  data-comments-opened=\"1\"  data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\"  data-image-title=\"DA 1\"  data-image-description=\"\"  data-image-caption=\"\"  data-medium-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?fit=300%2C41&amp;ssl=1\"  data-large-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?fit=700%2C95&amp;ssl=1\"  class=\"alignnone size-full wp-image-5222 pk-lazyload\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAu8AAABmAQMAAABBbe0rAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAACBJREFUaN7twTEBAAAAwqD1T20ND6AAAAAAAAAAAAD4MiXaAAGu94jbAAAAAElFTkSuQmCC\"  alt=\"Exploratory Data Analysis\"  width=\"751\"  height=\"102\"  data-pk-sizes=\"auto\"  data-ls-sizes=\"auto, (max-width: 751px) 100vw, 751px\"  data-pk-src=\"https:\/\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png\"  data-pk-srcset=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?w=751&amp;ssl=1 751w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=120%2C16&amp;ssl=1 120w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=90%2C12&amp;ssl=1 90w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=320%2C43&amp;ssl=1 320w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=560%2C76&amp;ssl=1 560w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=240%2C33&amp;ssl=1 240w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=180%2C24&amp;ssl=1 180w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=640%2C87&amp;ssl=1 640w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=300%2C41&amp;ssl=1 300w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-1.png?resize=700%2C95&amp;ssl=1 700w\" ><\/div>\n<div><\/div>\n<p>We have to observe how the data is, so we have to display initial first raws.head() function provides that functionality.<\/p>\n<div>\n<pre class=\"lang:default decode:true\">iris.head()<\/pre>\n<p class=\"eq er dc bk es b et eu ev ew ex ey ez fa fb fc fd\" data-selectable-paragraph=\"\"><img  loading=\"lazy\"  decoding=\"async\"  data-attachment-id=\"5223\"  data-permalink=\"https:\/\/code4developers.com\/exploratory-data-analysis\/da-2\/\"  data-orig-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?fit=528%2C187&amp;ssl=1\"  data-orig-size=\"528,187\"  data-comments-opened=\"1\"  data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\"  data-image-title=\"DA 2\"  data-image-description=\"\"  data-image-caption=\"\"  data-medium-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?fit=300%2C106&amp;ssl=1\"  data-large-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?fit=528%2C187&amp;ssl=1\"  class=\"alignnone size-full wp-image-5223 pk-lazyload\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAhAAAAC7AQMAAAAOijXOAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAACNJREFUaN7twTEBAAAAwqD1T20ND6AAAAAAAAAAAAAAAADODDDxAAEPfE4RAAAAAElFTkSuQmCC\"  alt=\"DA 2\"  width=\"528\"  height=\"187\"  data-pk-sizes=\"auto\"  data-ls-sizes=\"auto, (max-width: 528px) 100vw, 528px\"  data-pk-src=\"https:\/\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png\"  data-pk-srcset=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?w=528&amp;ssl=1 528w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=120%2C43&amp;ssl=1 120w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=90%2C32&amp;ssl=1 90w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=320%2C113&amp;ssl=1 320w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=240%2C85&amp;ssl=1 240w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=180%2C64&amp;ssl=1 180w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-2.png?resize=300%2C106&amp;ssl=1 300w\" ><\/p>\n<p class=\"eq er dc bk es b et eu ev ew ex ey ez fa fb fc fd\" data-selectable-paragraph=\"\">Data contains null values , so we have to fill that null blocks with some values. So first of all we have to figure out how much null each columns contains.<\/p>\n<pre class=\"lang:default decode:true\">iris.isna().sum()<\/pre>\n<\/div>\n<p data-selectable-paragraph=\"\"><img  loading=\"lazy\"  decoding=\"async\"  data-attachment-id=\"5224\"  data-permalink=\"https:\/\/code4developers.com\/exploratory-data-analysis\/da-3\/\"  data-orig-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?fit=260%2C117&amp;ssl=1\"  data-orig-size=\"260,117\"  data-comments-opened=\"1\"  data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\"  data-image-title=\"DA 3\"  data-image-description=\"\"  data-image-caption=\"\"  data-medium-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?fit=260%2C117&amp;ssl=1\"  data-large-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?fit=260%2C117&amp;ssl=1\"  class=\"alignnone size-full wp-image-5224 pk-lazyload\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQQAAAB1AQMAAACSzw2ZAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAABtJREFUSMftwTEBAAAAwqD1T20LL6AAAACAnwEPigABM7X3HQAAAABJRU5ErkJggg==\"  alt=\"DA 3\"  width=\"260\"  height=\"117\"  data-pk-sizes=\"auto\"  data-ls-sizes=\"auto, (max-width: 260px) 100vw, 260px\"  data-pk-src=\"https:\/\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png\"  data-pk-srcset=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?w=260&amp;ssl=1 260w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?resize=120%2C54&amp;ssl=1 120w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?resize=90%2C41&amp;ssl=1 90w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?resize=240%2C108&amp;ssl=1 240w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-3.png?resize=180%2C81&amp;ssl=1 180w\" ><\/p>\n<p data-selectable-paragraph=\"\">Here species is target column , means we have to classified data in spices. So we have to observe which unique species exists in data frame.<\/p>\n<pre class=\"lang:default decode:true\">iris[\u201cspecies\u201d].value_counts()<\/pre>\n<p data-selectable-paragraph=\"\"><img  loading=\"lazy\"  decoding=\"async\"  data-attachment-id=\"5225\"  data-permalink=\"https:\/\/code4developers.com\/exploratory-data-analysis\/da-4\/\"  data-orig-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?fit=363%2C91&amp;ssl=1\"  data-orig-size=\"363,91\"  data-comments-opened=\"1\"  data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\"  data-image-title=\"DA 4\"  data-image-description=\"\"  data-image-caption=\"\"  data-medium-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?fit=300%2C75&amp;ssl=1\"  data-large-file=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?fit=363%2C91&amp;ssl=1\"  class=\"alignnone wp-image-5225 size-full pk-lazyload\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAWsAAABbAQMAAACPhTYiAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAABtJREFUWMPtwTEBAAAAwqD1T20ND6AAAAAADg0QtQABQxDpwgAAAABJRU5ErkJggg==\"  alt=\"Exploratory Data Analysis\"  width=\"363\"  height=\"91\"  data-pk-sizes=\"auto\"  data-ls-sizes=\"auto, (max-width: 363px) 100vw, 363px\"  data-pk-src=\"https:\/\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png\"  data-pk-srcset=\"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?w=363&amp;ssl=1 363w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=120%2C30&amp;ssl=1 120w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=90%2C23&amp;ssl=1 90w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=320%2C80&amp;ssl=1 320w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=240%2C60&amp;ssl=1 240w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=180%2C45&amp;ssl=1 180w, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/DA-4.png?resize=300%2C75&amp;ssl=1 300w\" ><\/p>\n<h4 id=\"2-d-scatter-plot\"><strong class=\"es hl\">2 D Scatter plot<\/strong><\/h4>\n<p>Two-dimensional scatterplots visualize a relation (correlation) between two variables X and Y . Individual data points are represented in two-dimensional space, where axes represent the variables . The two coordinates that determine the location of each point correspond to its specific values on the two variables.<\/p>\n<pre class=\"lang:default decode:true\">sns.set_style(\u201cwhitegrid\u201d);\r\nsns.FacetGrid(iris, hue=\u201dspecies\u201d, size=4) \\\r\n.map(plt.scatter, \u201csepal_length\u201d, \u201csepal_width\u201d) \\\r\n.add_legend();\r\nplt.show();<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm hm\">\n<div class=\"gv r go gw\">\n<div class=\"hn r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"358\"  height=\"287\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/358\/1*ZllptaJjual1DLXdFNzUWQ.png\" ><\/div>\n<div class=\"hn r\">Here different different species are displayed by different different colours. We can conclude that with the help of 2d scatter plot we can observe how different different categories are seprated from each other by two continuous features. So we have to draw multiple 2d scatterplot between each attribute. Pairplot solve this problem ! Pairplot plots 2D scatter plot between each 2 attrbutes.<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.set_style(\u201cwhitegrid\u201d);\r\nsns.pairplot(iris, hue=\u201dspecies\u201d, size=3);\r\n\r\n\r\nplt.show()<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"gm gn go gp ak\">\n<div class=\"cl cm ho\">\n<div class=\"gv r go gw\">\n<div class=\"hp r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"937\"  height=\"853\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/937\/1*BM6rkDnllBW0cNabwf6b5A.png\" ><\/div>\n<div><\/div>\n<div><strong>Observations:<\/strong><\/div>\n<div>petal_length and petal_width are the most useful features to identify various flower types. While Setosa can be easily identified (linearly seperable), Virnica and Versicolor have some overlap (almost linearly seperable). We can find \u201clines\u201d and \u201cif-else\u201d conditions to build a simple model to classify the flower types.<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<div><\/div>\n<div><span style=\"font-size: 22px; font-weight: bold;\">Histogram<\/span><\/div>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"gm gn go gp ak\">\n<div class=\"cl cm ho\">\n<p id=\"077c\" data-selectable-paragraph=\"\">A\u00a0histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable. It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one.<\/p>\n<pre class=\"lang:default decode:true\">sns.FacetGrid(iris, hue=\u201dspecies\u201d, size=5) \\\r\n.map(sns.distplot, \u201cpetal_length\u201d) \\\r\n.add_legend();\r\nplt.show();<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm hq\">\n<div class=\"gv r go gw\">\n<div class=\"hr r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"479\"  height=\"409\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/479\/1*jnNpS_ykADY2NXHHJ5XViA.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.FacetGrid(iris, hue=\u201dspecies\u201d, size=5) \\\r\n.map(sns.distplot, \u201cpetal_width\u201d) \\\r\n.add_legend();\r\nplt.show();<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm hs\">\n<div class=\"gv r go gw\">\n<div class=\"ht r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"477\"  height=\"415\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/477\/1*i5tCHxzA7L9ADw2s6RwyWQ.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.FacetGrid(iris, hue=\u201dspecies\u201d, size=5) \\\r\n.map(sns.distplot, \u201csepal_length\u201d) \\\r\n.add_legend();\r\nplt.show();<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm hu\">\n<div class=\"gv r go gw\">\n<div class=\"hv r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"478\"  height=\"395\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/478\/1*ApTD-Va11b8DYy79Uo9wSA.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.FacetGrid(iris, hue=\u201dspecies\u201d, size=5) \\\r\n.map(sns.distplot, \u201csepal_width\u201d) \\\r\n.add_legend();\r\nplt.show();<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm hw\">\n<div class=\"gv r go gw\">\n<div class=\"hx r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"484\"  height=\"398\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/484\/1*z1dEmyUk0DyTh0FOkpz-Lg.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<h4 id=\"pdf-and-cdf\">PDF and CDF<\/h4>\n<ul>\n<li>PDF ( Probability Density Function): This basically is a probability law for a continuous random variable say X ( for discrete, it is probability mass function).<\/li>\n<\/ul>\n<p id=\"9cb2\" style=\"padding-left: 40px;\" data-selectable-paragraph=\"\">The probability law defines the chances of the random variable taking a particular value say x, i.e. P (X=x).<br \/>\nHowever this definition is not valid for continuous random variables because the probability at a given point is zero.<br \/>\nAn alternate to this is: pdf= P (x-e&lt;X&lt;=x)\/e as e tends to zero.<\/p>\n<ul>\n<li>CDF ( Cumulative Distribution Function): As the name cumulative suggests, this is simply the probability upto a particular value of the random variable, say x. Generally denoted by F, F= P (X&lt;=x) for any value of x in the X space. It is defined for both discrete and continuous random variables.<\/li>\n<\/ul>\n<pre class=\"lang:default decode:true \">iris_setosa = iris.loc[iris[\u201cspecies\u201d] == \u201csetosa\u201d];\r\ncounts, bin_edges = np.histogram(iris_setosa[\u2018petal_length\u2019], bins=10,\r\ndensity = True)\r\npdf = counts\/(sum(counts))\r\ncdf = np.cumsum(pdf)\r\nplt.plot(bin_edges[1:],pdf);\r\nplt.plot(bin_edges[1:], cdf)\r\n\r\ncounts, bin_edges = np.histogram(iris_setosa[\u2018petal_length\u2019], bins=20,\r\ndensity = True)\r\npdf = counts\/(sum(counts))\r\nplt.plot(bin_edges[1:],pdf);\r\n\r\nplt.show();<\/pre>\n<p><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"430\"  height=\"283\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/430\/1*N3mJdPE7lKSPZnCZlj6BUw.png\" ><\/p>\n<p id=\"5dbd\" data-selectable-paragraph=\"\">We can plot distribution of different different classes on single plot . With the help of this plot we can compare and visualize distribution of an attribute for different different classes .<\/p>\n<p id=\"6aa5\" data-selectable-paragraph=\"\">Here we have observed that petal_length is best attribute to differentiate classes . So below is the plot of petal_length for each classes.<\/p>\n<pre class=\"lang:default decode:true\">iris_setosa = iris.loc[iris[\u201cspecies\u201d] == \u201csetosa\u201d];\r\niris_virginica = iris.loc[iris[\u201cspecies\u201d] == \u201cvirginica\u201d];\r\niris_versicolor = iris.loc[iris[\u201cspecies\u201d] == \u201cversicolor\u201d];\r\n\r\ncounts, bin_edges = np.histogram(iris_setosa[\u2018petal_length\u2019], bins=10,\r\ndensity = True)\r\npdf = counts\/(sum(counts))\r\ncdf = np.cumsum(pdf)\r\nplt.plot(bin_edges[1:],pdf)\r\nplt.plot(bin_edges[1:], cdf)\r\n\r\n# virginica\r\ncounts, bin_edges = np.histogram(iris_virginica[\u2018petal_length\u2019], bins=10,\r\ndensity = True)\r\npdf = counts\/(sum(counts))\r\ncdf = np.cumsum(pdf)\r\nplt.plot(bin_edges[1:],pdf)\r\nplt.plot(bin_edges[1:], cdf)\r\n\r\n#versicolor\r\ncounts, bin_edges = np.histogram(iris_versicolor[\u2018petal_length\u2019], bins=10,\r\ndensity = True)\r\npdf = counts\/(sum(counts))\r\ncdf = np.cumsum(pdf)\r\nplt.plot(bin_edges[1:],pdf)\r\nplt.plot(bin_edges[1:], cdf)\r\n\r\nplt.show();<\/pre>\n<p>&nbsp;<\/p>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ia\">\n<div class=\"gv r go gw\">\n<div class=\"ib r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"468\"  height=\"279\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/468\/1*vTSx-mh9pHybdiY3kBe6cw.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p>&nbsp;<\/p>\n<h4 id=\"mean-variance-and-std-dev\">Mean, Variance and Std-dev<\/h4>\n<p id=\"2a42\" data-selectable-paragraph=\"\">Now let\u2019s observe core characteristics of data like mean , median , variance and standard deviation for each classes.<\/p>\n<pre class=\"lang:default decode:true\">print(np.mean(iris_setosa[\u201cpetal_length\u201d]))\r\nprint(np.mean(iris_virginica[\u201cpetal_length\u201d]))\r\nprint(np.mean(iris_versicolor[\u201cpetal_length\u201d]))\r\n\r\nprint(\u201c\\nStd-dev:\u201d);\r\nprint(np.std(iris_setosa[\u201cpetal_length\u201d]))\r\nprint(np.std(iris_virginica[\u201cpetal_length\u201d]))\r\nprint(np.std(iris_versicolor[\u201cpetal_length\u201d]))\r\n\r\nprint(\u201c\\nMedians:\u201d)\r\nprint(np.median(iris_setosa[\u201cpetal_length\u201d]))\r\nprint(np.median(iris_virginica[\u201cpetal_length\u201d]))\r\nprint(np.median(iris_versicolor[\u201cpetal_length\u201d]))<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ic\">\n<div class=\"gv r go gw\">\n<div class=\"id r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  width=\"242\"  height=\"233\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/242\/1*fNEX86z_SBiJVioGAlJzJw.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"3f34\" data-selectable-paragraph=\"\">With the help of describe method we can observe mean, median, minimum, maximum and spread of each feature.<\/p>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ie\">\n<div class=\"gv r go gw\">\n<div class=\"if r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  width=\"480\"  height=\"291\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/480\/1*w0WiK1a6tocjhYDAdpqwcA.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<h4 id=\"box-plot\">Box plot<\/h4>\n<p id=\"c8ef\" data-selectable-paragraph=\"\">A boxplot is a standardized way of displaying the distribution of data based on a five number summary (\u201cminimum\u201d, first quartile (Q1), median, third quartile (Q3), and \u201cmaximum\u201d). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.<\/p>\n<pre class=\"lang:default decode:true\">sns.boxplot(x=\u2019species\u2019,y=\u2019petal_length\u2019, data=iris)<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ig\">\n<div class=\"gv r go gw\">\n<div class=\"ih r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"407\"  height=\"274\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/407\/1*l_tLc7-z4Zk1NztyEptRkg.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.boxplot(x=\u2019species\u2019,y=\u2019petal_width\u2019, data=iris)<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ii\">\n<div class=\"gv r go gw\">\n<div class=\"ij r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"442\"  height=\"277\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/442\/1*nrKPy1Hp5NS5ah5mEyeM3Q.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.boxplot(x=\u2019species\u2019,y=\u2019sepal_width\u2019, data=iris)<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm ik\">\n<div class=\"gv r go gw\">\n<div class=\"il r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"426\"  height=\"282\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/426\/1*uZs7u4KFWwVc2sM4sjDfbQ.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<pre class=\"lang:default decode:true\">sns.boxplot(x=\u2019species\u2019,y=\u2019sepal_length\u2019, data=iris)<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm im\">\n<div class=\"gv r go gw\">\n<div class=\"in r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"416\"  height=\"280\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/416\/1*GnUsnyy_avPZSipsOOIx4A.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"3222\" data-selectable-paragraph=\"\"><strong>Conclusion<\/strong> : Here we can conclude that with the help of \u2018petal_length\u2019 attribute we can differentiate classes properly.<\/p>\n<h4 id=\"correlation-ratio\">Correlation ratio<\/h4>\n<p id=\"f3dd\" data-selectable-paragraph=\"\">In statistics, the\u00a0correlation ratio\u00a0is a measure of the relationship between the statistical\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Statistical_dispersion\" target=\"_blank\" rel=\"noopener noreferrer\">d<\/a>ispersion within individual categories and the dispersion across the whole population or sample. Correlation heatmap shows correlation ratio between each columns.<\/p>\n<pre class=\"lang:default decode:true\">plt.figure(figsize=(5,5))\r\nsns.heatmap(iris.corr(),\r\nvmin=-1,\r\ncmap=\u2019coolwarm\u2019,\r\nannot=True);<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm io\">\n<div class=\"gv r go gw\">\n<div class=\"ip r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"343\"  height=\"370\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/343\/1*oWOiHI7GbasCucB0jPs7gw.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p>&nbsp;<\/p>\n<h4 id=\"skewness-and-kurtosis\">Skewness and Kurtosis<\/h4>\n<p id=\"bec3\" data-selectable-paragraph=\"\">Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.<\/p>\n<p id=\"cad0\" data-selectable-paragraph=\"\">Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case.<\/p>\n<pre class=\"lang:default decode:true\">iris.skew(axis = 0, skipna = True)<\/pre>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm iq\">\n<div class=\"gv r go gw\">\n<div class=\"ir r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"218\"  height=\"91\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/218\/1*DkV7MceRjwKR0bEjkpi3CA.png\" ><\/div>\n<div><\/div>\n<div>If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.<\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"78c0\" data-selectable-paragraph=\"\">If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and 1(positively skewed), the data are moderately skewed.<\/p>\n<p data-selectable-paragraph=\"\">Now, If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed), the data are highly skewed.<\/p>\n<p id=\"eb9d\" data-selectable-paragraph=\"\">from scipy.stats import kurtosis<br \/>\niris.kurtosis(axis=0)<\/p>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm is\">\n<div class=\"gv r go gw\">\n<div class=\"it r\"><img  loading=\"lazy\"  decoding=\"async\"  class=\"ma qf cp t u gs ak hb alignnone pk-lazyload\"  role=\"presentation\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"Exploratory Data Analysis\"  width=\"251\"  height=\"106\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/miro.medium.com\/max\/251\/1*dLplzvGTPoq9cB44M_7evA.png\" ><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<\/div>\n<\/div>\n<\/figure>\n<div><\/div>\n<div><\/div>\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"gm gn go gp ak\">\n<div class=\"cl cm ho\">\n<figure class=\"gg gh gi gj gk gl cl cm paragraph-image\">\n<div class=\"cl cm is\">\n<div class=\"gv r go gw\">\n<div>If the kurtosis value is greater than 3 than there are outliers in data . We must have to remove that outliers.<\/div>\n<div><\/div>\n<\/div>\n<\/div>\n<\/figure>\n<p id=\"1251\" data-selectable-paragraph=\"\">So finally we can conclude that EDA helps in best way to observe data to humans with the help of graphs and characteristics of numerical data.<\/p>\n<\/div>\n<\/div>\n<\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning is an application of AI(Artificial Intelligence) that makes computers to learn themselves from given data without being explicitly programmed. Now days computers are much powerful that they can&hellip;<\/p>\n","protected":false},"author":1177,"featured_media":5202,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[304,305],"tags":[303,307,306],"powerkit_post_featured":[],"class_list":{"0":"post-5203","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"category-machine-learning","9":"tag-data-science","10":"tag-exploratory-data-analysis","11":"tag-machine-learning"},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2019\/10\/1_Ra02AqsQlC0KV229EvM98g.jpg?fit=702%2C336&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8NAi4-1lV","jetpack-related-posts":[{"id":8128,"url":"https:\/\/code4developers.com\/why-python-becomes-essential-to-learn-in-21st-century\/","url_meta":{"origin":5203,"position":0},"title":"Why Python becomes essential to learn in 21st century?","author":"Shyam Desai","date":"May 29, 2020","format":false,"excerpt":"Many programming languages are being used by developers in the corporate world then why python becomes essential to learn in this decade. The whole article is written about how the language is useful for all kinds of developers without seeing that he or she is a web developer, software developer,\u2026","rel":"","context":"In &quot;Python&quot;","block_context":{"text":"Python","link":"https:\/\/code4developers.com\/category\/python\/"},"img":{"alt_text":"python","src":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2020\/05\/Python.png?fit=1200%2C1200&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2020\/05\/Python.png?fit=1200%2C1200&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2020\/05\/Python.png?fit=1200%2C1200&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2020\/05\/Python.png?fit=1200%2C1200&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2020\/05\/Python.png?fit=1200%2C1200&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":12274,"url":"https:\/\/code4developers.com\/new-learning-models-for-fall-2020\/","url_meta":{"origin":5203,"position":1},"title":"New Learning Models for Fall 2020","author":"Code4Developers","date":"July 15, 2020","format":false,"excerpt":"Last spring, the pandemic upended much of what we consider traditional teaching and learning. Educators, students, and parents were forced to triage learning via a variety of online strategies that stretched from handing out digital worksheets to holding class online at the same time every day.\u00a0With a little less a\u2026","rel":"","context":"Similar post","block_context":{"text":"Similar post","link":""},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":3715,"url":"https:\/\/code4developers.com\/google-deepminds-ai-algorithm\/","url_meta":{"origin":5203,"position":2},"title":"Google DeepMind\u2019s AI Algorithm &#8211; Create 3D Models From Regular 2D Images","author":"Arif Khoja","date":"June 15, 2018","format":false,"excerpt":"Google\u2019s London-based AI subsidiary, DeepMind, has developed an algorithm that can render full 3D models of objects and scenes from regular 2D images. Called the Generative Query Network (GQN), the new algorithm can be used for a wide range of applications, including robotic vision, VR simulation, and more. All this,\u2026","rel":"","context":"In &quot;News&quot;","block_context":{"text":"News","link":"https:\/\/code4developers.com\/category\/news\/"},"img":{"alt_text":"deepmind-logo","src":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/06\/deepmind-logo.jpg?fit=1200%2C600&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/06\/deepmind-logo.jpg?fit=1200%2C600&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/06\/deepmind-logo.jpg?fit=1200%2C600&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/06\/deepmind-logo.jpg?fit=1200%2C600&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/06\/deepmind-logo.jpg?fit=1200%2C600&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":3143,"url":"https:\/\/code4developers.com\/looping-without-loops\/","url_meta":{"origin":5203,"position":3},"title":"Looping without loops.","author":"Arif Khoja","date":"November 29, 2017","format":false,"excerpt":"As a coder or an aspiring one you have probably experienced many moments where all the smokes blows away and you understand something much clearer. One of these moments for me was when I was introduced to recursion. Probably, while learning Scheme or Haskell. The basic idea is that a\u2026","rel":"","context":"In &quot;JavaScript&quot;","block_context":{"text":"JavaScript","link":"https:\/\/code4developers.com\/category\/javascript\/"},"img":{"alt_text":"javascript","src":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2017\/10\/javascript.jpg?fit=750%2C422&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2017\/10\/javascript.jpg?fit=750%2C422&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2017\/10\/javascript.jpg?fit=750%2C422&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2017\/10\/javascript.jpg?fit=750%2C422&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":14544,"url":"https:\/\/code4developers.com\/reflections-on-10years-in-software-engineering\/","url_meta":{"origin":5203,"position":4},"title":"Celebrating a Decade of Phenomenal Growth: Insights and Reflections on 10 Years of Software Engineering","author":"Yatendrasinh Joddha","date":"July 1, 2023","format":false,"excerpt":"Ten years ago (1st July 2013), I embarked on a journey that has since defined my life in so many ways. I started my career as a Trainee Developer, filled with a deep passion to build a successful career in the ever-evolving field of software engineering. Today, I find myself\u2026","rel":"","context":"In &quot;experience&quot;","block_context":{"text":"experience","link":"https:\/\/code4developers.com\/category\/experience\/"},"img":{"alt_text":"Software Engineering","src":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2023\/06\/Purple-Stars-Celebrating-10-Years-in-Business-Instagram-Post-e1688130688482.png?fit=400%2C400&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":3457,"url":"https:\/\/code4developers.com\/native-html5-games-with-phaser-and-capacitor\/","url_meta":{"origin":5203,"position":5},"title":"Native HTML5 Games with Phaser and Capacitor","author":"Arif Khoja","date":"April 17, 2018","format":false,"excerpt":"Building HTML5 games with Phaser has been somewhat of a hobby of mine over the past few years, and I will be writing a few tutorials about Phaser with a focus on developing HTML5 games for mobile. However, despite having created a few mostly finished games most of them never\u2026","rel":"","context":"In &quot;Phaser&quot;","block_context":{"text":"Phaser","link":"https:\/\/code4developers.com\/category\/phaser\/"},"img":{"alt_text":"phaser-starter-demo","src":"https:\/\/i0.wp.com\/code4developers.com\/wp-content\/uploads\/2018\/04\/phaser-starter-demo-1-300x227.gif?resize=350%2C200","width":350,"height":200},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/posts\/5203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/users\/1177"}],"replies":[{"embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/comments?post=5203"}],"version-history":[{"count":6,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/posts\/5203\/revisions"}],"predecessor-version":[{"id":5230,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/posts\/5203\/revisions\/5230"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/media\/5202"}],"wp:attachment":[{"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/media?parent=5203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/categories?post=5203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/tags?post=5203"},{"taxonomy":"powerkit_post_featured","embeddable":true,"href":"https:\/\/code4developers.com\/wp-json\/wp\/v2\/powerkit_post_featured?post=5203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}