{"id":15045,"date":"2021-08-17T08:30:00","date_gmt":"2021-08-17T06:30:00","guid":{"rendered":"https:\/\/rubikscode.net\/?p=15045"},"modified":"2021-08-16T18:11:03","modified_gmt":"2021-08-16T16:11:03","slug":"ml-optimization-pt-3-hyperparameter-optimization-with-python","status":"publish","type":"post","link":"https:\/\/rubikscode.net\/2021\/08\/17\/ml-optimization-pt-3-hyperparameter-optimization-with-python\/","title":{"rendered":"Guide to Hyperparameter Tuning and Optimization with Python"},"content":{"rendered":"\n[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.4.6&#8243;][et_pb_row _builder_version=&#8221;4.4.6&#8243;][et_pb_column type=&#8221;4_4&#8243; _builder_version=&#8221;4.4.6&#8243;][et_pb_code _builder_version=&#8221;4.4.6&#8243; locked=&#8221;off&#8221;]<!-- Begin Mailchimp Signup Form --><!-- [et_pb_line_break_holder] --><link href=\"\/\/cdn-images.mailchimp.com\/embedcode\/classic-10_7.css\" rel=\"stylesheet\" type=\"text\/css\"><!-- [et_pb_line_break_holder] --><style type=\"text\/css\"><!-- [et_pb_line_break_holder] -->\t#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; }<!-- [et_pb_line_break_holder] -->\t\/* Add your own Mailchimp form style overrides in your site stylesheet or in this style block.<!-- [et_pb_line_break_holder] -->\t   We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. *\/<!-- [et_pb_line_break_holder] --><\/style><!-- [et_pb_line_break_holder] --><div id=\"mc_embed_signup\"><!-- [et_pb_line_break_holder] --><form action=\"https:\/\/rubikscode.us3.list-manage.com\/subscribe\/post?u=a175fa8b52c303e69c38f64b5&#038;id=f1f074045f\" method=\"post\" id=\"mc-embedded-subscribe-form\" name=\"mc-embedded-subscribe-form\" class=\"validate\" target=\"_blank\" novalidate><!-- [et_pb_line_break_holder] -->    <div id=\"mc_embed_signup_scroll\"><!-- [et_pb_line_break_holder] -->\t<h4>The code that accompanies this article can be received after subscription<\/h4><!-- [et_pb_line_break_holder] -->      <input type=\"hidden\" name=\"OPTIN\" value=\"hyper\"><!-- [et_pb_line_break_holder] --><div class=\"indicates-required\"><span class=\"asterisk\">*<\/span> indicates required<\/div><!-- [et_pb_line_break_holder] --><div class=\"mc-field-group\"><!-- [et_pb_line_break_holder] -->\t<label for=\"mce-EMAIL\">Email Address  <span class=\"asterisk\">*<\/span><!-- [et_pb_line_break_holder] --><\/label><!-- [et_pb_line_break_holder] -->\t<input type=\"email\" value=\"\" name=\"EMAIL\" class=\"required email\" id=\"mce-EMAIL\"><!-- [et_pb_line_break_holder] --><\/div><!-- [et_pb_line_break_holder] -->\t<div id=\"mce-responses\" class=\"clear\"><!-- [et_pb_line_break_holder] -->\t\t<div class=\"response\" id=\"mce-error-response\" style=\"display:none\"><\/div><!-- [et_pb_line_break_holder] -->\t\t<div class=\"response\" id=\"mce-success-response\" style=\"display:none\"><\/div><!-- [et_pb_line_break_holder] -->\t<\/div>    <!-- real people should not fill this in and expect good things - do not remove this or risk form bot signups--><!-- [et_pb_line_break_holder] -->    <div style=\"position: absolute; left: -5000px;\" aria-hidden=\"true\"><input type=\"text\" name=\"b_a175fa8b52c303e69c38f64b5_f1f074045f\" tabindex=\"-1\" value=\"\"><\/div><!-- [et_pb_line_break_holder] -->    <div class=\"clear\"><input type=\"submit\" value=\"Subscribe\" name=\"subscribe\" id=\"mc-embedded-subscribe\" class=\"button\"><\/div><!-- [et_pb_line_break_holder] -->    <\/div><!-- [et_pb_line_break_holder] --><\/form><!-- [et_pb_line_break_holder] --><\/div><!-- [et_pb_line_break_holder] --><script type='text\/javascript' src='\/\/s3.amazonaws.com\/downloads.mailchimp.com\/js\/mc-validate.js'><\/script><script type='text\/javascript'>(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='OPTIN';ftypes[1]='text';}(jQuery));var $mcj = jQuery.noConflict(true);<\/script><!-- [et_pb_line_break_holder] --><!--End mc_embed_signup-->[\/et_pb_code][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">In a previous couple of articles, we were specifically focused on machine learning algorithms&#8217;\u00a0<a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>performance<\/strong><\/a>. We talked about how to quantify machine learning model performance and how to improve it with <strong><a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">regularization<\/a><\/strong>. Apart from that, we covered the topic optimization techniques, both basic ones like <strong><a href=\"https:\/\/rubikscode.net\/2021\/06\/28\/ml-optimization-pt-1-gradient-descent-with-python\/\" target=\"_blank\" rel=\"noopener noreferrer\">Gradient Descent<\/a><\/strong> and <strong>advanced ones<\/strong>, like Adam.<\/p>\n<p style=\"text-align: justify;\">It is pretty surreal how completely different sub-branches grew around the concept of model optimization. One of those sub-branches is hyperparameter optimization or hyperparameter tuning.<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; custom_margin=&#8221;||1px|||&#8221; border_width_top=&#8221;1px&#8221; border_color_top=&#8221;rgba(51,51,51,0.38)&#8221; locked=&#8221;off&#8221;][\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/05\/comercial.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/05\/comercial.png&#038;#8221<\/a>; alt=&#8221;Ultimate Guide to Machine Learning with Python&#8221; title_text=&#8221;comercial&#8221; url=&#8221;<a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/&#038;#8221\">https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/&#038;#8221<\/a>; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; locked=&#8221;off&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; text_text_color=&#8221;#6b6b6b&#8221; border_width_bottom=&#8221;1px&#8221; border_color_bottom=&#8221;#6b6b6b&#8221; locked=&#8221;off&#8221;]<p style=\"text-align: center;\">This bundle of e-books is specially crafted for\u00a0<strong>beginners<\/strong>.\nEverything from Python basics to the deployment of Machine Learning algorithms to production in one place.\nBecome a Machine Learning Superhero\u00a0<a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/\"><strong>TODAY<\/strong>!<\/a><\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">In this article, you can find:<\/p>\n<ol>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#hyper\"><strong>Hyperparameters in Machine Learning<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#data\"><strong>Prerequisites and Data<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#grid\"><strong>Grid Search Hyperparameter Tuning<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#random\"><strong>Random Search Hyperparameter Tuning<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#bay\"><strong>Bayesian Hyperparameter Optimization<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#halving\"><strong>Halving Grid Search &amp; Halving Random Search<\/strong><\/a><\/li>\n<li style=\"text-align: justify; padding-left: 30px;\"><a href=\"#alter\"><strong>Alternatives<\/strong><\/a><\/li>\n<\/ol>\n[\/et_pb_text][et_pb_text module_id=&#8221;hyper&#8221; _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">1. Hyperparameters in Machine Learning<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Hyperparameters are an <strong>integral<\/strong> part of every machine learning and deep learning algorithm. Unlike standard machine learning parameters that are learned by the algorithm itself (like w and b in linear regression, or connection weights in a neural network), hyperparameters are set by the engineer <strong>before<\/strong> the training process.<\/p>\n<p style=\"text-align: justify;\">They are an external factor that controls the behavior of the learning algorithm fully defined by the engineer. Do you need some examples? The <em>learning rate<\/em> is one of the most famous hyperparameters, <em>C<\/em> in SVM is also a hyperparameter, maximal depth of Decision Tree is a hyperparameter, etc. These can be set manually by the engineer.<\/p>\n<p style=\"text-align: justify;\">[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/07\/undraw_Chat_bot_re_e2gj.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/07\/undraw_Chat_bot_re_e2gj.png&#038;#8221<\/a>; alt=&#8221;AI Visual&#8221; title_text=&#8221;AI Visual&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; locked=&#8221;off&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">However, if we want to run multiple tests, this can be <strong>tiresome<\/strong>. That is where we use hyperparameter optimization. The main goal of these techniques is to find the hyperparameters of a given machine learning algorithm that <strong>deliver<\/strong> the best performance as measured on a validation set. In this tutorial, we explore several techniques that can give you the best hyperparameters.<\/p>[\/et_pb_text][et_pb_text module_id=&#8221;data&#8221; _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">2. Prerequisites &amp; Data<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>2.1 Prerequisites and Libraries<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; locked=&#8221;off&#8221;]<p style=\"text-align: justify;\">For the purpose of this article, make sure that you have installed the following <em>Python\u00a0<\/em>libraries:<\/p>\n<ul style=\"text-align: justify;\">\n<li><strong>NumPy\u00a0<\/strong>&#8211; Follow <a href=\"https:\/\/numpy.org\/install\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this guide<\/strong><\/a> if you need help with installation.<\/li>\n<li><strong>SciKit Learn\u00a0<\/strong>&#8211; Follow <a href=\"https:\/\/scikit-learn.org\/stable\/install.html\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this guide<\/strong><\/a> if you need help with installation.<\/li>\n<li><strong>SciPy\u00a0<\/strong>&#8211; Follow <a href=\"https:\/\/www.scipy.org\/install.html\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this guide<\/strong><\/a> if you need help with installation.<\/li>\n<li><strong>Sci-Kit Optimization<\/strong> &#8211; Follow <a href=\"https:\/\/scikit-optimize.github.io\/stable\/install.html\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this guide<\/strong><\/a> if you need help with installation.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">Once installed make sure that you have imported all the necessary modules that are used in this tutorial.<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgbnVtcHkgYXMgbnAKaW1wb3J0IG1hdHBsb3RsaWIucHlwbG90IGFzIHBsdAoKCmZyb20gc2tsZWFybi5wcmVwcm9jZXNzaW5nIGltcG9ydCBTdGFuZGFyZFNjYWxlcgpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCB0cmFpbl90ZXN0X3NwbGl0CmZyb20gc2tsZWFybi5tZXRyaWNzIGltcG9ydCBmMV9zY29yZQoKZnJvbSBza2xlYXJuLm1vZGVsX3NlbGVjdGlvbiBpbXBvcnQgR3JpZFNlYXJjaENWLCBSYW5kb21pemVkU2VhcmNoQ1YKCmZyb20gc2tsZWFybi5leHBlcmltZW50YWwgaW1wb3J0IGVuYWJsZV9oYWx2aW5nX3NlYXJjaF9jdgpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCBIYWx2aW5nR3JpZFNlYXJjaENWLCBIYWx2aW5nUmFuZG9tU2VhcmNoQ1YKCmZyb20gc2tsZWFybi5zdm0gaW1wb3J0IFNWQwpmcm9tIHNrbGVhcm4uZW5zZW1ibGUgaW1wb3J0IFJhbmRvbUZvcmVzdFJlZ3Jlc3NvcgoKZnJvbSBzY2lweSBpbXBvcnQgc3RhdHMKZnJvbSBza29wdCBpbXBvcnQgQmF5ZXNTZWFyY2hDVgpmcm9tIHNrb3B0LnNwYWNlIGltcG9ydCBSZWFsLCBDYXRlZ29yaWNhbA==&#8221; _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]aW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgbnVtcHkgYXMgbnAKaW1wb3J0IG1hdHBsb3RsaWIucHlwbG90IGFzIHBsdAoKCmZyb20gc2tsZWFybi5wcmVwcm9jZXNzaW5nIGltcG9ydCBTdGFuZGFyZFNjYWxlcgpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCB0cmFpbl90ZXN0X3NwbGl0CmZyb20gc2tsZWFybi5tZXRyaWNzIGltcG9ydCBmMV9zY29yZQoKZnJvbSBza2xlYXJuLm1vZGVsX3NlbGVjdGlvbiBpbXBvcnQgR3JpZFNlYXJjaENWLCBSYW5kb21pemVkU2VhcmNoQ1YKCmZyb20gc2tsZWFybi5leHBlcmltZW50YWwgaW1wb3J0IGVuYWJsZV9oYWx2aW5nX3NlYXJjaF9jdgpmcm9tIHNrbGVhcm4ubW9kZWxfc2VsZWN0aW9uIGltcG9ydCBIYWx2aW5nR3JpZFNlYXJjaENWLCBIYWx2aW5nUmFuZG9tU2VhcmNoQ1YKCmZyb20gc2tsZWFybi5zdm0gaW1wb3J0IFNWQwpmcm9tIHNrbGVhcm4uZW5zZW1ibGUgaW1wb3J0IFJhbmRvbUZvcmVzdFJlZ3Jlc3NvcgoKZnJvbSBzY2lweSBpbXBvcnQgc3RhdHMKZnJvbSBza29wdCBpbXBvcnQgQmF5ZXNTZWFyY2hDVgpmcm9tIHNrb3B0LnNwYWNlIGltcG9ydCBSZWFsLCBDYXRlZ29yaWNhbA==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243; locked=&#8221;off&#8221;]<p>Apart from that, it would be good to be at least familiar with the basics of\u00a0<a href=\"https:\/\/rubikscode.net\/2019\/04\/29\/mathematics-for-artificial-intelligence-linear-algebra\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>linear algebra<\/strong><\/a>, <a href=\"https:\/\/rubikscode.net\/2019\/05\/13\/mathematics-for-artificial-intelligence-calculus-optimization\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>calculus<\/strong> <\/a>and <a href=\"https:\/\/rubikscode.net\/2019\/05\/06\/mathematics-for-artificial-intelligence-probability\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>probability<\/strong><\/a>.<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>2.2 Preparing Data<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; locked=&#8221;off&#8221;]<p style=\"text-align: justify;\">Data that we use in this article is from <strong>PalmerPenguins<\/strong> Dataset. This dataset has been recently introduced as an alternative to the famous Iris dataset. It is created by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. You can obtain this dataset <a href=\"https:\/\/github.com\/allisonhorst\/palmerpenguins\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>here<\/strong><\/a>, or via Kaggle.<\/p>\n<p style=\"text-align: justify;\">This dataset is essentially composed of two datasets, each containing data of 344 penguins. Just like in Iris dataset there are 3 different species of penguins coming from 3 islands in the Palmer Archipelago. Also, these datasets contain <strong>culmen<\/strong> dimensions for each species. The culmen is the upper ridge of a bird\u2019s bill. In the simplified penguin\u2019s data, culmen length and depth are renamed as variables <em>culmen_length_mm<\/em> and <em>culmen_depth_mm<\/em>.<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/07\/culmen_depth.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/07\/culmen_depth.png&#038;#8221<\/a>; title_text=&#8221;culmen_depth&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; locked=&#8221;off&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Since this dataset is labeled, we will be able to <strong>verify<\/strong> the result of our experiments. However, this is often not the case and validation of clustering algorithm results is often a hard and complicated process.<\/p>\n<p style=\"text-align: justify;\">Let&#8217;s load and prepare <em>PalmerPenguins<\/em> dataset. First, we load the dataset, remove features that we don\u2019t use in this article:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;ZGF0YSA9IHBkLnJlYWRfY3N2KCcuL2RhdGEvcGVuZ3VpbnNfc2l6ZS5jc3YnKQoKZGF0YSA9IGRhdGEuZHJvcG5hKCkKZGF0YSA9IGRhdGEuZHJvcChbJ3NleCcsICdpc2xhbmQnLCAnZmxpcHBlcl9sZW5ndGhfbW0nLCAnYm9keV9tYXNzX2cnXSwgYXhpcz0xKQ==&#8221; _builder_version=&#8221;4.4.6&#8243;]ZGF0YSA9IHBkLnJlYWRfY3N2KCcuL2RhdGEvcGVuZ3VpbnNfc2l6ZS5jc3YnKQoKZGF0YSA9IGRhdGEuZHJvcG5hKCkKZGF0YSA9IGRhdGEuZHJvcChbJ3NleCcsICdpc2xhbmQnLCAnZmxpcHBlcl9sZW5ndGhfbW0nLCAnYm9keV9tYXNzX2cnXSwgYXhpcz0xKQ==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Then we separate input data and scale it:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;WCA9IGRhdGEuZHJvcChbJ3NwZWNpZXMnXSwgYXhpcz0xKQoKc3MgPSBTdGFuZGFyZFNjYWxlcigpClggPSBzcy5maXRfdHJhbnNmb3JtKFgpIAoKeSA9IGRhdGFbJ3NwZWNpZXMnXQpzcGljaWVzID0geydBZGVsaWUnOiAwLCAnQ2hpbnN0cmFwJzogMSwgJ0dlbnRvbyc6IDJ9CnkgPSBbc3BpY2llc1tpdGVtXSBmb3IgaXRlbSBpbiB5XQp5ID0gbnAuYXJyYXkoeSkg&#8221; _builder_version=&#8221;4.4.6&#8243;]WCA9IGRhdGEuZHJvcChbJ3NwZWNpZXMnXSwgYXhpcz0xKQoKc3MgPSBTdGFuZGFyZFNjYWxlcigpClggPSBzcy5maXRfdHJhbnNmb3JtKFgpIAoKeSA9IGRhdGFbJ3NwZWNpZXMnXQpzcGljaWVzID0geydBZGVsaWUnOiAwLCAnQ2hpbnN0cmFwJzogMSwgJ0dlbnRvbyc6IDJ9CnkgPSBbc3BpY2llc1tpdGVtXSBmb3IgaXRlbSBpbiB5XQp5ID0gbnAuYXJyYXkoeSkg[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Finally, we split data into training and testing datasets:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;WF90cmFpbiwgWF90ZXN0LCB5X3RyYWluLCB5X3Rlc3QgPSB0cmFpbl90ZXN0X3NwbGl0KFgsIHksIHRlc3Rfc2l6ZT0wLjIsIHJhbmRvbV9zdGF0ZT0zMyk=&#8221; _builder_version=&#8221;4.4.6&#8243;]WF90cmFpbiwgWF90ZXN0LCB5X3RyYWluLCB5X3Rlc3QgPSB0cmFpbl90ZXN0X3NwbGl0KFgsIHksIHRlc3Rfc2l6ZT0wLjIsIHJhbmRvbV9zdGF0ZT0zMyk=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">When we plot the data here is what it looks like:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/2.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/2.jpg&#038;#8221<\/a>; title_text=&#8221;2&#8243; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; module_id=&#8221;grid&#8221;]<h2>3. Grid Search Hyperparameter Tuning<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Manual hyperparameter tuning is <strong>slow<\/strong> and tiresome. That is why we explore the first and simplest hyperparameters optimization technique &#8211; <strong>Grid Search<\/strong>. This technique is speeding up that process and it is one of the most used hyperparameter optimization techniques. In its essence, it <strong>automates<\/strong> the trial and error process. For this technique, we provide a <strong>list<\/strong> of all hyperparameter values and this algorithm builds a model for each possible combination, evaluates it, and selects values that provide the best results. It is a <strong>universal<\/strong> technique that can be applied to any model.<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_metrics_gtu7.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_metrics_gtu7.png&#038;#8221<\/a>; title_text=&#8221;undraw_metrics_gtu7&#8243; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">In our example, we use <a href=\"https:\/\/rubikscode.net\/2020\/08\/10\/back-to-machine-learning-basics-support-vector-machines\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong><em>SVM<\/em> algorithm for classification<\/strong><\/a>. There are three hyperparameters that we take into consideration &#8211; <em>C<\/em>, <em>gamma<\/em> and <em>kernel<\/em>. To understand them in more detail, check out <a href=\"https:\/\/rubikscode.net\/2020\/08\/10\/back-to-machine-learning-basics-support-vector-machines\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this article<\/strong><\/a>. For <em>C<\/em> we want to check the following values: 0.1, 1, 100, 1000; for <em>gamma<\/em> we use values: 0.0001, 0.001, 0.005, 0.1, 1, 3, 5, and for <em>kernel<\/em> we use values: <em>&#8216;linear&#8217;<\/em> and <em>&#8216;rbf&#8217;<\/em>.\u00a0<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>3.1 Grid Search\u00a0Implementation<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Here is how that looks like in the code:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aHlwZXJwYXJhbWV0ZXJzID0gewogICAgJ0MnOiBbMC4xLCAxLCAxMDAsIDEwMDBdLAogICAgJ2dhbW1hJzogWzAuMDAwMSwgMC4wMDEsIDAuMDA1LCAwLjEsIDEsIDMsIDVdLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0=&#8221; _builder_version=&#8221;4.4.6&#8243;]aHlwZXJwYXJhbWV0ZXJzID0gewogICAgJ0MnOiBbMC4xLCAxLCAxMDAsIDEwMDBdLAogICAgJ2dhbW1hJzogWzAuMDAwMSwgMC4wMDEsIDAuMDA1LCAwLjEsIDEsIDMsIDVdLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">We utilize <em>Sci-Kit Learn<\/em> and its <em>SVC<\/em> class which contains the implementation of <em>SVM<\/em> for classification. Apart from that, we use <em><strong>GridSearchCV<\/strong> <\/em>class, which is used for grid search optimization. Combined that looks like this:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;Z3JpZCA9IEdyaWRTZWFyY2hDVigKICAgICAgICBlc3RpbWF0b3I9U1ZDKCksCiAgICAgICAgcGFyYW1fZ3JpZD1oeXBlcnBhcmFtZXRlcnMsCiAgICAgICAgY3Y9NSwgCglzY29yaW5nPSdmMV9taWNybycsIAoJbl9qb2JzPS0xKQ==&#8221; _builder_version=&#8221;4.4.6&#8243;]Z3JpZCA9IEdyaWRTZWFyY2hDVigKICAgICAgICBlc3RpbWF0b3I9U1ZDKCksCiAgICAgICAgcGFyYW1fZ3JpZD1oeXBlcnBhcmFtZXRlcnMsCiAgICAgICAgY3Y9NSwgCglzY29yaW5nPSdmMV9taWNybycsIAoJbl9qb2JzPS0xKQ==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>This class receives several parameters through the constructor:<\/p>\n<ul>\n<li><strong>estimator<\/strong> &#8211; the instance machine learning algorithm itself. We pass the new instance of the SVC class there.<\/li>\n<li><strong>param_grid<\/strong>\u00a0&#8211; contains hyperparameter dictionary.<\/li>\n<li><strong>cv<\/strong> &#8211; Determines the cross-validation splitting strategy.<\/li>\n<li><strong>scoring<\/strong> &#8211; The validation metrics used to evaluate the predictions. We use F1 score.<\/li>\n<li><strong>n_jobs<\/strong> &#8211; Represents the number of jobs to run in parallel. Value -1 means that is using all processors.<\/li>\n<\/ul>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_key_points_ig28.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_key_points_ig28.png&#038;#8221<\/a>; title_text=&#8221;undraw_key_points_ig28&#8243; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>The only thing left to do is run the training process, by utilizing the <em>fit<\/em> method:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;Z3JpZC5maXQoWF90cmFpbiwgeV90cmFpbik=&#8221; _builder_version=&#8221;4.4.6&#8243;]Z3JpZC5maXQoWF90cmFpbiwgeV90cmFpbik=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Once training is complete, we can check the best hyperparameters and the score of those parameters:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtncmlkLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtncmlkLmJlc3Rfc2NvcmVffScp&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtncmlkLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtncmlkLmJlc3Rfc2NvcmVffScp[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiAxMDAwLCAnZ2FtbWEnOiAwLjEsICdrZXJuZWwnOiAncmJmJ30KQmVzdCBzY29yZTogMC45NjI2ODM0MzgxNTUxMzYxCQk=&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiAxMDAwLCAnZ2FtbWEnOiAwLjEsICdrZXJuZWwnOiAncmJmJ30KQmVzdCBzY29yZTogMC45NjI2ODM0MzgxNTUxMzYxCQk=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Also, we can print out all the results:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidBbGwgcmVzdWx0czoge2dyaWQuY3ZfcmVzdWx0c199Jyk=&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidBbGwgcmVzdWx0czoge2dyaWQuY3ZfcmVzdWx0c199Jyk=[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QWxscmVzdWx0czogeydtZWFuX2ZpdF90aW1lJzogYXJyYXkoWzAuMDA3ODAwMTUsIDAuMDAyODAxNDcsIDAuMDAxMjAwMTUsIDAuMDAyMTk5OTgsIDAuMDI0MDAwNiAsCiAgICAgICAwLjAwNzM5OTQyLCAwLjAwMDU5OTYyLCAwLjAwNjAwMDMzLCAwLjAwMDk5OTQgLCAwLjAwMjc5Nzg5LAogICAgICAgMC4wMDA5OTk2OSwgMC4wMDM0MDExNCwgMC4wMDA1OTk4NiwgMC4wMDI5OTg2NCwgMC4wMDA1OTcgICwKICAgICAgIDAuMDAzNDAwMjMsIDAuMDAxMTk2NTgsIDAuMDAyODAwOTQsIDAuMDAwNjAwNTgsIDAuMDAxNzk5NDQsCiAgICAgICAwLjAwMDk5OTY0LCAwLjAwMDc5OTY2LCAwLjAwMDk5OTE2LCAwLjAwMTAwMDMxLCAwLjAwMDc5OTk5LAogICAgICAgMC4wMDIgICAgICwgMC4wMDA4MDAyMywgMC4wMDIyMDAzNywgMC4wMDExOTk1OCwgMC4wMDE2MDAxMiwKICAgICAgIDAuMDI5Mzk5NjMsIDAuMDAwOTk5NTUsIDAuMDAxMTk5NjMsIDAuMDAxMzk5OTUsIDAuMDAxMDAwNjksCiAgICAgICAwLjAwMTAwMDE3LCAwLjAwMTQwMDUyLCAwLjAwMTE5OTc3LCAwLjAwMDk5OTc0LCAwLjAwMTgwMDA2LAogICAgICAgMC4wMDEwMDMxMiwgMC4wMDE5OTk3NiwgMC4wMDIyMDAwMywgMC4wMDMyMDA5NiwgMC4wMDI0MDAzNSwKICAgICAgIDAuMDAxOTk5ICAsIDAuMDAzMTk5ODIsIDAuMDAxOTk5OTUsIDAuMDAyOTk5MzEsIDAuMDAxOTk5MjgsICAgCi4uLg==&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QWxscmVzdWx0czogeydtZWFuX2ZpdF90aW1lJzogYXJyYXkoWzAuMDA3ODAwMTUsIDAuMDAyODAxNDcsIDAuMDAxMjAwMTUsIDAuMDAyMTk5OTgsIDAuMDI0MDAwNiAsCiAgICAgICAwLjAwNzM5OTQyLCAwLjAwMDU5OTYyLCAwLjAwNjAwMDMzLCAwLjAwMDk5OTQgLCAwLjAwMjc5Nzg5LAogICAgICAgMC4wMDA5OTk2OSwgMC4wMDM0MDExNCwgMC4wMDA1OTk4NiwgMC4wMDI5OTg2NCwgMC4wMDA1OTcgICwKICAgICAgIDAuMDAzNDAwMjMsIDAuMDAxMTk2NTgsIDAuMDAyODAwOTQsIDAuMDAwNjAwNTgsIDAuMDAxNzk5NDQsCiAgICAgICAwLjAwMDk5OTY0LCAwLjAwMDc5OTY2LCAwLjAwMDk5OTE2LCAwLjAwMTAwMDMxLCAwLjAwMDc5OTk5LAogICAgICAgMC4wMDIgICAgICwgMC4wMDA4MDAyMywgMC4wMDIyMDAzNywgMC4wMDExOTk1OCwgMC4wMDE2MDAxMiwKICAgICAgIDAuMDI5Mzk5NjMsIDAuMDAwOTk5NTUsIDAuMDAxMTk5NjMsIDAuMDAxMzk5OTUsIDAuMDAxMDAwNjksCiAgICAgICAwLjAwMTAwMDE3LCAwLjAwMTQwMDUyLCAwLjAwMTE5OTc3LCAwLjAwMDk5OTc0LCAwLjAwMTgwMDA2LAogICAgICAgMC4wMDEwMDMxMiwgMC4wMDE5OTk3NiwgMC4wMDIyMDAwMywgMC4wMDMyMDA5NiwgMC4wMDI0MDAzNSwKICAgICAgIDAuMDAxOTk5ICAsIDAuMDAzMTk5ODIsIDAuMDAxOTk5OTUsIDAuMDAyOTk5MzEsIDAuMDAxOTk5MjgsICAgCi4uLg==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Ok, let&#8217;s now build this model and check how well it performs on the test dataset:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;bW9kZWwgPSBTVkMoQz01MDAsIGdhbW1hID0gMC4xLCBrZXJuZWwgPSAncmJmJykKbW9kZWwuZml0KFhfdHJhaW4sIHlfdHJhaW4pCgpwcmVkaXRpb25zID0gbW9kZWwucHJlZGljdChYX3Rlc3QpCnByaW50KGYxX3Njb3JlKHByZWRpdGlvbnMsIHlfdGVzdCwgYXZlcmFnZT0nbWljcm8nKSk=&#8221; _builder_version=&#8221;4.4.6&#8243;]bW9kZWwgPSBTVkMoQz01MDAsIGdhbW1hID0gMC4xLCBrZXJuZWwgPSAncmJmJykKbW9kZWwuZml0KFhfdHJhaW4sIHlfdHJhaW4pCgpwcmVkaXRpb25zID0gbW9kZWwucHJlZGljdChYX3Rlc3QpCnByaW50KGYxX3Njb3JlKHByZWRpdGlvbnMsIHlfdGVzdCwgYXZlcmFnZT0nbWljcm8nKSk=[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;MC45NzAxNDkyNTM3MzEzNDMz&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]MC45NzAxNDkyNTM3MzEzNDMz[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Cool, our model with the proposed hyperparameters got the accuracy ~97%. Here is what the model looks like when plotted:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/gridsearch.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/gridsearch.jpg&#038;#8221<\/a>; title_text=&#8221;gridsearch&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; module_id=&#8221;random&#8221;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">4. Random Search Hyperparameter Tuning<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Grid search is super simple. However, it is also computing <strong>expensive<\/strong>. Especially in the area of <a href=\"https:\/\/rubikscode.net\/deep-learning-for-programmers\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>deep learning<\/strong><\/a>, where training can take a lot of time. Also, it can happen that some of the hyperparameters are more important than others. That is why the idea of <strong>Random Search<\/strong> was born and introduced in <a href=\"http:\/\/www.jmlr.org\/papers\/volume13\/bergstra12a\/bergstra12a.pdf\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>this paper<\/strong><\/a>. In fact, this study shows that random search is more efficient than the grid search for hyperparameter optimization in terms of computing costs. This technique allows the more precise discovery of good values for the important hyperparameters too.<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/04\/undraw_programmer_imem.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/04\/undraw_programmer_imem.png&#038;#8221<\/a>; title_text=&#8221;undraw_programmer_imem&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Just like <em>Grid Search<\/em>, Random Search creates a <strong>grid<\/strong> of hyperparameter values and selects random combinations to train the model. It\u2019s possible for this approach to miss the most optimal combinations, however, it surprisingly picks the best result<strong> more often<\/strong> than not and in a fraction of the time compared to <em>Grid Search<\/em>.\u00a0<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>4.1 Random Search\u00a0Implementation<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Let&#8217;s see how that works in the code. Again we utilize <em>Sci-Kit Learn&#8217;s<\/em> SVC class, but this time we use <em><strong>RandomSearchCV<\/strong> <\/em>class for random search optimization.<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBzdGF0cy51bmlmb3JtKDUwMCwgMTUwMCksCiAgICAiZ2FtbWEiOiBzdGF0cy51bmlmb3JtKDAsIDEpLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0KCnJhbmRvbSA9IFJhbmRvbWl6ZWRTZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHBhcmFtX2Rpc3RyaWJ1dGlvbnMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgbl9pdGVyID0gMTAwLCAKICAgICAgICAgICAgICAgIGN2ID0gMywgCiAgICAgICAgICAgICAgICByYW5kb21fc3RhdGU9NDIsIAogICAgICAgICAgICAgICAgbl9qb2JzID0gLTEpCgpyYW5kb20uZml0KFhfdHJhaW4sIHlfdHJhaW4p&#8221; _builder_version=&#8221;4.4.6&#8243;]aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBzdGF0cy51bmlmb3JtKDUwMCwgMTUwMCksCiAgICAiZ2FtbWEiOiBzdGF0cy51bmlmb3JtKDAsIDEpLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0KCnJhbmRvbSA9IFJhbmRvbWl6ZWRTZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHBhcmFtX2Rpc3RyaWJ1dGlvbnMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgbl9pdGVyID0gMTAwLCAKICAgICAgICAgICAgICAgIGN2ID0gMywgCiAgICAgICAgICAgICAgICByYW5kb21fc3RhdGU9NDIsIAogICAgICAgICAgICAgICAgbl9qb2JzID0gLTEpCgpyYW5kb20uZml0KFhfdHJhaW4sIHlfdHJhaW4p[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Note that we used uniform distribution for C and gamma. Again, we can print out the results:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtyYW5kb20uYmVzdF9wYXJhbXNffScpCnByaW50KGYnQmVzdCBzY29yZToge3JhbmRvbS5iZXN0X3Njb3JlX30nKQ==&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtyYW5kb20uYmVzdF9wYXJhbXNffScpCnByaW50KGYnQmVzdCBzY29yZToge3JhbmRvbS5iZXN0X3Njb3JlX30nKQ==[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MTAuNTk5NDU3ODI5NTc2MSwgJ2dhbW1hJzogMC4wMjMwNjI0MjUwNDE0MTU3NTcsICdrZXJuZWwnOiAnbGluZWFyJ30KQmVzdCBzY29yZTogMC45NzAwMzc0NTMxODM1MjA1&#8243; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MTAuNTk5NDU3ODI5NTc2MSwgJ2dhbW1hJzogMC4wMjMwNjI0MjUwNDE0MTU3NTcsICdrZXJuZWwnOiAnbGluZWFyJ30KQmVzdCBzY29yZTogMC45NzAwMzc0NTMxODM1MjA1[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Note that we got close, but different results than when we used<em> Grid Search<\/em>. The value of the hyperparameter <em>C<\/em> was 500 with <em>Grid Search<\/em>, while with <em>Random Search<\/em> we got 510.59. From this alone, you can see the benefit of Random Search, since it is unlikely that we would <strong>put<\/strong> this value in the grid search list. Similarly for the <em>gamma,<\/em> we got 0.23 for <em>Random Search<\/em> against 0.1 for <em>Grid Search<\/em>. What is really surprising is that Random Search picked <strong>linear<\/strong> kernel and not RBF and that it got a higher <em>F1 Score<\/em> with it. To print all results we use the <em>cv_results_<\/em> attribute:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidBbGwgcmVzdWx0czoge3JhbmRvbS5jdl9yZXN1bHRzX30nKQ==&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidBbGwgcmVzdWx0czoge3JhbmRvbS5jdl9yZXN1bHRzX30nKQ==[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QWxscmVzdWx0czogeydtZWFuX2ZpdF90aW1lJzogYXJyYXkoWzAuMDAyMDAwNjUsIDAuMDAyMzM0MDQsIDAuMDAxMDA0NTQsIDAuMDAyMzM3NzcsIDAuMDAxMDAwMDksCiAgICAgICAwLjAwMDMzMzM5LCAwLjAwMDk5NzE1LCAwLjAwMTMyOTQyLCAwLjAwMDk5OTIxLCAwLjAwMDY2NzI1LAogICAgICAgMC4wMDI2NjU2OCwgMC4wMDIzMzM0OCwgMC4wMDIzMzMwMSwgMC4wMDA2NjY3ICwgMC4wMDIzMzI4NSwKICAgICAgIDAuMDAxMDAwMDEsIDAuMDAwOTk5OTMsIDAuMDAwMzMzMzEsIDAuMDAxNjY3NDIsIDAuMDAyMzMzNjQsCiAgICAgICAwLjAwMTk5OTE0LCAwLjAwNDMzMjg2LCAwLjAwMzk5OTE1LCAwLjAwMjAwMDQ5LCAwLjAxMDMzMzM4LAogICAgICAgMC4wMDEwMDM0MiwgMC4wMDI5OTk3ICwgMC4wMDE2NjY1NSwgMC4wMDE2NjcyNiwgMC4wMDEzMzQwMywKICAgICAgIDAuMDAyMzMyOTMsIDAuMDAxMzM3MjksIDAuMDAxMDAwMDksIDAuMDAwNjY2NjIsIDAuMDAwNjY2NDYsCgkgICAKCSAgLi4uLg==&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QWxscmVzdWx0czogeydtZWFuX2ZpdF90aW1lJzogYXJyYXkoWzAuMDAyMDAwNjUsIDAuMDAyMzM0MDQsIDAuMDAxMDA0NTQsIDAuMDAyMzM3NzcsIDAuMDAxMDAwMDksCiAgICAgICAwLjAwMDMzMzM5LCAwLjAwMDk5NzE1LCAwLjAwMTMyOTQyLCAwLjAwMDk5OTIxLCAwLjAwMDY2NzI1LAogICAgICAgMC4wMDI2NjU2OCwgMC4wMDIzMzM0OCwgMC4wMDIzMzMwMSwgMC4wMDA2NjY3ICwgMC4wMDIzMzI4NSwKICAgICAgIDAuMDAxMDAwMDEsIDAuMDAwOTk5OTMsIDAuMDAwMzMzMzEsIDAuMDAxNjY3NDIsIDAuMDAyMzMzNjQsCiAgICAgICAwLjAwMTk5OTE0LCAwLjAwNDMzMjg2LCAwLjAwMzk5OTE1LCAwLjAwMjAwMDQ5LCAwLjAxMDMzMzM4LAogICAgICAgMC4wMDEwMDM0MiwgMC4wMDI5OTk3ICwgMC4wMDE2NjY1NSwgMC4wMDE2NjcyNiwgMC4wMDEzMzQwMywKICAgICAgIDAuMDAyMzMyOTMsIDAuMDAxMzM3MjksIDAuMDAxMDAwMDksIDAuMDAwNjY2NjIsIDAuMDAwNjY2NDYsCgkgICAKCSAgLi4uLg==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Let&#8217;s do the same thing as we did fro Grid Search: create the model with proposed hyperparameters, check the score on the test dataset and plot out the model.\u00a0<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;bW9kZWwgPSBTVkMoQz01MTAuNTk5NDU3ODI5NTc2MSwgZ2FtbWEgPSAwLjAyMzA2MjQyNTA0MTQxNTc1Nywga2VybmVsID0gJ2xpbmVhcicpCm1vZGVsLmZpdChYX3RyYWluLCB5X3RyYWluKQoKcHJlZGl0aW9ucyA9IG1vZGVsLnByZWRpY3QoWF90ZXN0KQpwcmludChmMV9zY29yZShwcmVkaXRpb25zLCB5X3Rlc3QsIGF2ZXJhZ2U9J21pY3JvJykp&#8221; _builder_version=&#8221;4.4.6&#8243;]bW9kZWwgPSBTVkMoQz01MTAuNTk5NDU3ODI5NTc2MSwgZ2FtbWEgPSAwLjAyMzA2MjQyNTA0MTQxNTc1Nywga2VybmVsID0gJ2xpbmVhcicpCm1vZGVsLmZpdChYX3RyYWluLCB5X3RyYWluKQoKcHJlZGl0aW9ucyA9IG1vZGVsLnByZWRpY3QoWF90ZXN0KQpwcmludChmMV9zY29yZShwcmVkaXRpb25zLCB5X3Rlc3QsIGF2ZXJhZ2U9J21pY3JvJykp[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;MC45NzAxNDkyNTM3MzEzNDMz&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]MC45NzAxNDkyNTM3MzEzNDMz[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Wow, the F1 score on test dataset is exactly the same as when we used Grid Search. Check out the model:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/randomsearch.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/randomsearch.jpg&#038;#8221<\/a>; title_text=&#8221;randomsearch&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; module_id=&#8221;bay&#8221;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">5. Bayesian Hyperparameter Optimization<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Really cool fact about the previous two algorithms is that all he experiments with various hyperparameter values can be run in <strong>parallel<\/strong>. This can save us a lot of time. However, this is also their biggest lack. Meaning, since every experiment is run in <strong>isolation<\/strong>, we can not use the <strong>information<\/strong> from past experiments in the current one. There is a whole field that is dedicated to the problem of sequenced optimization &#8211; s<strong>equential model-based optimization<\/strong> (SMBO). Algorithms that are explored in this field use previous experiments and observations of the loss function. Based on them they try to determine the next optimal point. One of such algorithms is <strong>Bayesian Optimisation<\/strong>.<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/02\/tree.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/02\/tree.png&#038;#8221<\/a>; alt=&#8221;Decision Tree&#8221; title_text=&#8221;tree&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>Just like other algorithms from the <em>SMBO<\/em> group use previously evaluated points (in this case those are hyperparameter values, but we can generalize) to compute the posterior expectation of what the loss function looks like. This algorithm uses two important math concepts &#8211; The <strong>Gaussian process<\/strong> and <strong>acquisition function<\/strong>. Since<em> Gaussian distribution<\/em> is done over random variables, <em>Gaussian process<\/em> is its <strong>generalization<\/strong> over functions. Just like Gaussian distribution has <strong>mean value<\/strong> and <strong>covariance<\/strong>, the <em>Gaussian process<\/em> is described by <strong>mean function<\/strong> and <strong>covariance function<\/strong>.<\/p>\n<p><strong>The acquisition function<\/strong> is the function using which we evaluate the current loss value. One way to observe it is as a loss function for loss function. It is a function of the posterior distribution over loss function, that describes the utility for all values of the hyperparameters. The most popular acquisition function is <strong>expected improvement<\/strong>:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/3.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/3.jpg&#038;#8221<\/a>; title_text=&#8221;3&#8243; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;340px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>where f is loss function, x&#8217; is the current optimal set of hyperparameters. When we put it all together Byesian optimization is done in 3 steps:<\/p>\n<ul>\n<li>Using previously evaluated points for loss function, the <strong>posterior expectation<\/strong> is calculated using <strong>Gaussian Process<\/strong>.<\/li>\n<li><strong>New set of points<\/strong> that maximizes expected improvent is chosen<\/li>\n<li>Loss function of new selected points is <strong>calculated<\/strong><\/li>\n<\/ul>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_progress_data_4ebj.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/03\/undraw_progress_data_4ebj.png&#038;#8221<\/a>; alt=&#8221;Decision Tree&#8221; title_text=&#8221;undraw_progress_data_4ebj&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; hover_enabled=&#8221;0&#8243;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>5.1 Bayesian Optimization Implementation<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p>The easies way to bring this to code is by using <em>Sci-Kit optimization<\/em> library, often called <strong>skopt<\/strong>. Following the process that we used on pravious examples, we can do the following:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBSZWFsKDFlLTYsIDFlKzYsIHByaW9yPSdsb2ctdW5pZm9ybScpLAogICAgImdhbW1hIjogUmVhbCgxZS02LCAxZSsxLCBwcmlvcj0nbG9nLXVuaWZvcm0nKSwKICAgICJrZXJuZWwiOiBDYXRlZ29yaWNhbChbJ2xpbmVhcicsICdyYmYnXSksCn0KCmJheWVzaWFuID0gQmF5ZXNTZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHNlYXJjaF9zcGFjZXMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgbl9pdGVyID0gMTAwLCAKICAgICAgICAgICAgICAgIGN2ID0gNSwgCiAgICAgICAgICAgICAgICByYW5kb21fc3RhdGU9NDIsIAogICAgICAgICAgICAgICAgbl9qb2JzID0gLTEpCgpiYXllc2lhbi5maXQoWF90cmFpbiwgeV90cmFpbik=&#8221; _builder_version=&#8221;4.4.6&#8243;]aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBSZWFsKDFlLTYsIDFlKzYsIHByaW9yPSdsb2ctdW5pZm9ybScpLAogICAgImdhbW1hIjogUmVhbCgxZS02LCAxZSsxLCBwcmlvcj0nbG9nLXVuaWZvcm0nKSwKICAgICJrZXJuZWwiOiBDYXRlZ29yaWNhbChbJ2xpbmVhcicsICdyYmYnXSksCn0KCmJheWVzaWFuID0gQmF5ZXNTZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHNlYXJjaF9zcGFjZXMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgbl9pdGVyID0gMTAwLCAKICAgICAgICAgICAgICAgIGN2ID0gNSwgCiAgICAgICAgICAgICAgICByYW5kb21fc3RhdGU9NDIsIAogICAgICAgICAgICAgICAgbl9qb2JzID0gLTEpCgpiYXllc2lhbi5maXQoWF90cmFpbiwgeV90cmFpbik=[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Again, we defined dictionary for <strong>set<\/strong> of hyperparameters. Note that we used <em>Real<\/em> and <em>Categorical<\/em> classes from <em>Sci-Kit Optimisation<\/em> library. Then we utilize <em><strong>BayesSearchCV<\/strong> <\/em>class in the same way we used <em>GridSearchCV<\/em> or <em>RandomSearchCV<\/em>. After the training is done, we can print out the best results:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtiYXllc2lhbi5iZXN0X3BhcmFtc199JykKcHJpbnQoZidCZXN0IHNjb3JlOiB7YmF5ZXNpYW4uYmVzdF9zY29yZV99Jyk=&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtiYXllc2lhbi5iZXN0X3BhcmFtc199JykKcHJpbnQoZidCZXN0IHNjb3JlOiB7YmF5ZXNpYW4uYmVzdF9zY29yZV99Jyk=[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QmVzdCBwYXJhbWV0ZXJzOiAKT3JkZXJlZERpY3QoWygnQycsIDM5MzIuMjUxNjEzMzA4NiksICgnZ2FtbWEnLCAwLjAwMTE2NDY3Mzc5Nzg3MzA0NDcpLCAoJ2tlcm5lbCcsICdyYmYnKV0pCkJlc3Qgc2NvcmU6IDAuOTYyNTQ2ODE2NDc5NDAwOA==&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QmVzdCBwYXJhbWV0ZXJzOiAKT3JkZXJlZERpY3QoWygnQycsIDM5MzIuMjUxNjEzMzA4NiksICgnZ2FtbWEnLCAwLjAwMTE2NDY3Mzc5Nzg3MzA0NDcpLCAoJ2tlcm5lbCcsICdyYmYnKV0pCkJlc3Qgc2NvcmU6IDAuOTYyNTQ2ODE2NDc5NDAwOA==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">It is interesting, istn&#8217;t it? We got quite different results using this optimization. Loss is a bit higher than when we used Random Search. We can even print out all results:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidBbGwgcmVzdWx0czoge2JheWVzaWFuLmN2X3Jlc3VsdHNffScp&#8221; _builder_version=&#8221;4.4.6&#8243;]cHJpbnQoZidBbGwgcmVzdWx0czoge2JheWVzaWFuLmN2X3Jlc3VsdHNffScp[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QWxsIHJlc3VsdHM6IGRlZmF1bHRkaWN0KDxjbGFzcyAnbGlzdCc+LCB7J3NwbGl0MF90ZXN0X3Njb3JlJzogWzAuOTYyOTYyOTYyOTYyOTYyOSwKICAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LAogIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuNDYyOTYyOTYyOTYyOTYyOTcsCiAgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjg3MDM3MDM3MDM3MDM3MDMsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAKICAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIAogIC4uLi4u&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]QWxsIHJlc3VsdHM6IGRlZmF1bHRkaWN0KDxjbGFzcyAnbGlzdCc+LCB7J3NwbGl0MF90ZXN0X3Njb3JlJzogWzAuOTYyOTYyOTYyOTYyOTYyOSwKICAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LAogIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuNDYyOTYyOTYyOTYyOTYyOTcsCiAgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjg3MDM3MDM3MDM3MDM3MDMsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAKICAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIDAuOTQ0NDQ0NDQ0NDQ0NDQ0NCwgMC45NDQ0NDQ0NDQ0NDQ0NDQ0LCAwLjk0NDQ0NDQ0NDQ0NDQ0NDQsIAogIC4uLi4u[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">How the model with these hyperparameters performs on the test dataset? Let&#8217;s find out:<\/p>[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;bW9kZWwgPSBTVkMoQz0zOTMyLjI1MTYxMzMwODYsIGdhbW1hID0gMC4wMDExNjQ2NzM3OTc4NzMwNDQ3LCBrZXJuZWwgPSAncmJmJykKbW9kZWwuZml0KFhfdHJhaW4sIHlfdHJhaW4pCgpwcmVkaXRpb25zID0gbW9kZWwucHJlZGljdChYX3Rlc3QpCnByaW50KGYxX3Njb3JlKHByZWRpdGlvbnMsIHlfdGVzdCwgYXZlcmFnZT0nbWljcm8nKSk=&#8221; _builder_version=&#8221;4.4.6&#8243;]bW9kZWwgPSBTVkMoQz0zOTMyLjI1MTYxMzMwODYsIGdhbW1hID0gMC4wMDExNjQ2NzM3OTc4NzMwNDQ3LCBrZXJuZWwgPSAncmJmJykKbW9kZWwuZml0KFhfdHJhaW4sIHlfdHJhaW4pCgpwcmVkaXRpb25zID0gbW9kZWwucHJlZGljdChYX3Rlc3QpCnByaW50KGYxX3Njb3JlKHByZWRpdGlvbnMsIHlfdGVzdCwgYXZlcmFnZT0nbWljcm8nKSk=[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;MC45ODUwNzQ2MjY4NjU2NzE2&#8243; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243;]MC45ODUwNzQ2MjY4NjU2NzE2[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">This is super interesting. We got better score on test dataset even though we got worse results on validation dataset. Here is the model:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/BayesianOptimization.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/BayesianOptimization.jpg&#038;#8221<\/a>; alt=&#8221;Decision Tree&#8221; title_text=&#8221;BayesianOptimization&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">Just for fun, let&#8217;s put all these models side by side:<\/p>[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/Untitled.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2020\/11\/Untitled.png&#038;#8221<\/a>; alt=&#8221;Decision Tree&#8221; title_text=&#8221;Untitled&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; module_id=&#8221;halving&#8221;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">6.\u00a0Halving Grid Search &amp; Halving Random Search<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">A couple of months ago Sci-Kit Learn introduced two new classes HalvingGridSearchCV and HalvingRandomSearchCV. They claimed that with these two classes they claimed that &#8220;they can be much faster at finding a good parameter combination&#8221;. These classes used search over specified parameter values with successive halving. This technique starts evaluating all the candidates with a small number of resources and iteratively selects the best candidates, using more and more resources.<\/p>\n[\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/07\/undraw_Chat_bot_re_e2gj.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/07\/undraw_Chat_bot_re_e2gj.png&#038;#8221<\/a>; title_text=&#8221;AI Visual&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; alt=&#8221;AI Visual&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">From the point of Halving Grid Search, this means that in the first iteration all candidates will be trained on the small amount of training data. The next iteration would include only candidates that performed the best in the previous iteration. These models would get more resources, ie. more training data and they would be evaluated. This process would continue, and Halving Grid Search would keep only the best candidates from the previous iterations until there is only one left.<\/p>\n<p style=\"text-align: justify;\">This whole process is controlled by two arguments \u2014 min_samples and factor. The first argument &#8211; min_samples represents the amount of data that the process will start with. With each iteration, this dataset will grow by the value defined by the factor. The process is similar to HalvingRandomSearchCV.<\/p>\n[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h3>6.1 Halving Grid Search &amp; Halving Random Search Implementation<\/h3>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">The code is similar like in previous examples, we just use different classes. Let&#8217;s start with <em>HalvingGridSearch<\/em>:<\/p>\n[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aHlwZXJwYXJhbWV0ZXJzID0gewogICAgJ0MnOiBbMC4xLCAxLCAxMDAsIDUwMCwgMTAwMF0sCiAgICAnZ2FtbWEnOiBbMC4wMDAxLCAwLjAwMSwgMC4wMSwgMC4wMDUsIDAuMSwgMSwgMywgNV0sCiAgICAna2VybmVsJzogKCdsaW5lYXInLCAncmJmJykKfQoKCgpncmlkID0gSGFsdmluZ0dyaWRTZWFyY2hDVigKICAgICAgICBlc3RpbWF0b3I9U1ZDKCksCiAgICAgICAgcGFyYW1fZ3JpZD1oeXBlcnBhcmFtZXRlcnMsCiAgICAgICAgY3Y9NSwgCiAgICAgICAgc2NvcmluZz0nZjFfbWljcm8nLCAKICAgICAgICBuX2pvYnM9LTEpCgpncmlkLmZpdChYX3RyYWluLCB5X3RyYWluKQ==&#8221; _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]aHlwZXJwYXJhbWV0ZXJzID0gewogICAgJ0MnOiBbMC4xLCAxLCAxMDAsIDUwMCwgMTAwMF0sCiAgICAnZ2FtbWEnOiBbMC4wMDAxLCAwLjAwMSwgMC4wMSwgMC4wMDUsIDAuMSwgMSwgMywgNV0sCiAgICAna2VybmVsJzogKCdsaW5lYXInLCAncmJmJykKfQoKCgpncmlkID0gSGFsdmluZ0dyaWRTZWFyY2hDVigKICAgICAgICBlc3RpbWF0b3I9U1ZDKCksCiAgICAgICAgcGFyYW1fZ3JpZD1oeXBlcnBhcmFtZXRlcnMsCiAgICAgICAgY3Y9NSwgCiAgICAgICAgc2NvcmluZz0nZjFfbWljcm8nLCAKICAgICAgICBuX2pvYnM9LTEpCgpncmlkLmZpdChYX3RyYWluLCB5X3RyYWluKQ==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">The interesting thing is that this code run in just 0.7 seconds. In comparison, the same code with\u00a0<em>GridSearchCV\u00a0<\/em>class lasts 3.6 seconds. That is much faster. The results are a bit different though:<\/p>\n[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtncmlkLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtncmlkLmJlc3Rfc2NvcmVffScp&#8221; _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]cHJpbnQoZidCZXN0IHBhcmFtZXRlcnM6IHtncmlkLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtncmlkLmJlc3Rfc2NvcmVffScp[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MDAsICdnYW1tYSc6IDAuMDA1LCAna2VybmVsJzogJ3JiZid9CkJlc3Qgc2NvcmU6IDAuOTUyOTQxMTc2NDcwNTg4Mg==&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MDAsICdnYW1tYSc6IDAuMDA1LCAna2VybmVsJzogJ3JiZid9CkJlc3Qgc2NvcmU6IDAuOTUyOTQxMTc2NDcwNTg4Mg==[\/et_pb_dmb_code_snippet][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">We got similar results, but not the same. If we create a model with these values, we will get the following accuracy and graph:<\/p>\n<p style=\"text-align: justify;\">\n[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;bW9kZWwgPSBTVkMoQz01MDAsIGdhbW1hID0gMC4wMDUsIGtlcm5lbCA9ICdyYmYnKQptb2RlbC5maXQoWF90cmFpbiwgeV90cmFpbikKCnByZWRpdGlvbnMgPSBtb2RlbC5wcmVkaWN0KFhfdGVzdCkKcHJpbnQoZjFfc2NvcmUocHJlZGl0aW9ucywgeV90ZXN0LCBhdmVyYWdlPSdtaWNybycpKQ==&#8221; _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]bW9kZWwgPSBTVkMoQz01MDAsIGdhbW1hID0gMC4wMDUsIGtlcm5lbCA9ICdyYmYnKQptb2RlbC5maXQoWF90cmFpbiwgeV90cmFpbikKCnByZWRpdGlvbnMgPSBtb2RlbC5wcmVkaWN0KFhfdGVzdCkKcHJpbnQoZjFfc2NvcmUocHJlZGl0aW9ucywgeV90ZXN0LCBhdmVyYWdlPSdtaWNybycpKQ==[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;MC45ODUwNzQ2MjY4NjU2NzE2&#8243; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]MC45ODUwNzQ2MjY4NjU2NzE2[\/et_pb_dmb_code_snippet][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/08\/HalvingGrid.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/08\/HalvingGrid.png&#038;#8221<\/a>; title_text=&#8221;Halving Grid Search Output Model&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; alt=&#8221;Halving Grid Search Output Model&#8221; hover_enabled=&#8221;0&#8243;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243;]<p style=\"text-align: justify;\">We do completely the same thing with Halving Random Search. It is interesting that with this approach we got the weirdest results. We may say that model created this way is overfitting hard:<\/p>\n[\/et_pb_text][et_pb_dmb_code_snippet code=&#8221;aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBzdGF0cy51bmlmb3JtKDUwMCwgMTUwMCksCiAgICAiZ2FtbWEiOiBzdGF0cy51bmlmb3JtKDAsIDEpLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0KCnJhbmRvbSA9IEhhbHZpbmdSYW5kb21TZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHBhcmFtX2Rpc3RyaWJ1dGlvbnMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgY3YgPSAzLCAKICAgICAgICAgICAgICAgIHJhbmRvbV9zdGF0ZT00MiwgCiAgICAgICAgICAgICAgICBuX2pvYnMgPSAtMSkKCnJhbmRvbS5maXQoWF90cmFpbiwgeV90cmFpbikKCnByaW50KGYnQmVzdCBwYXJhbWV0ZXJzOiB7cmFuZG9tLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtyYW5kb20uYmVzdF9zY29yZV99Jyk=&#8221; _builder_version=&#8221;4.4.6&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]aHlwZXJwYXJhbWV0ZXJzID0gewogICAgIkMiOiBzdGF0cy51bmlmb3JtKDUwMCwgMTUwMCksCiAgICAiZ2FtbWEiOiBzdGF0cy51bmlmb3JtKDAsIDEpLAogICAgJ2tlcm5lbCc6ICgnbGluZWFyJywgJ3JiZicpCn0KCnJhbmRvbSA9IEhhbHZpbmdSYW5kb21TZWFyY2hDVigKICAgICAgICAgICAgICAgIGVzdGltYXRvciA9IFNWQygpLCAKICAgICAgICAgICAgICAgIHBhcmFtX2Rpc3RyaWJ1dGlvbnMgPSBoeXBlcnBhcmFtZXRlcnMsIAogICAgICAgICAgICAgICAgY3YgPSAzLCAKICAgICAgICAgICAgICAgIHJhbmRvbV9zdGF0ZT00MiwgCiAgICAgICAgICAgICAgICBuX2pvYnMgPSAtMSkKCnJhbmRvbS5maXQoWF90cmFpbiwgeV90cmFpbikKCnByaW50KGYnQmVzdCBwYXJhbWV0ZXJzOiB7cmFuZG9tLmJlc3RfcGFyYW1zX30nKQpwcmludChmJ0Jlc3Qgc2NvcmU6IHtyYW5kb20uYmVzdF9zY29yZV99Jyk=[\/et_pb_dmb_code_snippet][et_pb_dmb_code_snippet code=&#8221;QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MzAuODc2NzQxNDQzNzAzNiwgJ2dhbW1hJzogMC45Njk5MDk4NTIxNjE5OTQzLCAna2VybmVsJzogJ3JiZid9CkJlc3Qgc2NvcmU6IDAuOTUwNjE3MjgzOTUwNjE3NA==&#8221; style=&#8221;dark&#8221; _builder_version=&#8221;4.4.6&#8243; body_text_color=&#8221;#dd6c38&#8243; hover_enabled=&#8221;0&#8243; locked=&#8221;off&#8221;]QmVzdCBwYXJhbWV0ZXJzOiB7J0MnOiA1MzAuODc2NzQxNDQzNzAzNiwgJ2dhbW1hJzogMC45Njk5MDk4NTIxNjE5OTQzLCAna2VybmVsJzogJ3JiZid9CkJlc3Qgc2NvcmU6IDAuOTUwNjE3MjgzOTUwNjE3NA==[\/et_pb_dmb_code_snippet][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/08\/halvingrandom.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/08\/halvingrandom.png&#038;#8221<\/a>; title_text=&#8221;Halving Random Search Model&#8221; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; alt=&#8221;Halving Random Search Model&#8221; hover_enabled=&#8221;0&#8243;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221; hover_enabled=&#8221;0&#8243; module_id=&#8221;alter&#8221;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">7. Alternatives<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">In general, previously described methods are the most popular and the most frequently used. However, there are several <strong>alternatives<\/strong> that you can consider if previous ones are not working out for you. One of them is <strong>Gradient-Based optimization<\/strong> of hyperparameter values. This technique calculates the gradient with respect to hyperparameters and then optimizes them using the gradient descent algorithm. The problem with this approach is that for gradient descent to work well we need function that is convex and smooth, which is often not the case when we talk about hyperparameters. The other approach is the use of <strong>Evolutionary algorithms<\/strong> for optimization.<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; header_2_line_height=&#8221;1.5em&#8221;]<h2 role=\"textbox\" aria-multiline=\"true\" class=\"rich-text editor-rich-text__editable block-editor-rich-text__editable is-selected\" contenteditable=\"true\" aria-label=\"Write heading\u2026\">Conclusion<\/h2>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\">In this article, we covered several well known hyperparameter optimization and tuning algorithms. We learned how we can use Grid search, random search and bayesian optimization to get best values for our hyperparameters. We also saw how we can utilize Sci-Kit Learn classes and methods to do so in code.<\/p>\n<p style=\"text-align: justify;\">Thank you for reading!<\/p>[\/et_pb_text][et_pb_text _builder_version=&#8221;4.4.6&#8243; custom_margin=&#8221;||1px|||&#8221; border_width_top=&#8221;1px&#8221; border_color_top=&#8221;rgba(51,51,51,0.38)&#8221; locked=&#8221;off&#8221;][\/et_pb_text][et_pb_image src=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/05\/comercial.png&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/05\/comercial.png&#038;#8221<\/a>; alt=&#8221;Ultimate Guide to Machine Learning with Python&#8221; title_text=&#8221;comercial&#8221; url=&#8221;<a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/&#038;#8221\">https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/&#038;#8221<\/a>; align=&#8221;center&#8221; _builder_version=&#8221;4.4.6&#8243; max_height=&#8221;427px&#8221; locked=&#8221;off&#8221;][\/et_pb_image][et_pb_text _builder_version=&#8221;4.4.6&#8243; text_text_color=&#8221;#6b6b6b&#8221; border_width_bottom=&#8221;1px&#8221; border_color_bottom=&#8221;#6b6b6b&#8221; locked=&#8221;off&#8221;]<p style=\"text-align: center;\">This bundle of e-books is specially crafted for\u00a0<strong>beginners<\/strong>.\nEverything from Python basics to the deployment of Machine Learning algorithms to production in one place.\nBecome a Machine Learning Superhero\u00a0<a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/\"><strong>TODAY<\/strong>!<\/a><\/p>[\/et_pb_text][et_pb_team_member name=&#8221;Nikola M. Zivkovic&#8221; image_url=&#8221;<a href=\"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/04\/LAD08605-scaled.jpg&#038;#8221\">https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/04\/LAD08605-scaled.jpg&#038;#8221<\/a>; twitter_url=&#8221;<a href=\"https:\/\/twitter.com\/NMZivkovic&#038;#8221\" rel=\"nofollow\">https:\/\/twitter.com\/NMZivkovic&#038;#8221<\/a>; linkedin_url=&#8221;<a href=\"https:\/\/www.linkedin.com\/in\/nmzivkovic\/&#038;#8221\" rel=\"nofollow\">https:\/\/www.linkedin.com\/in\/nmzivkovic\/&#038;#8221<\/a>; _builder_version=&#8221;4.4.6&#8243;]<p style=\"text-align: justify;\"><span>Nikola M. Zivkovic is\u00a0<\/span><span>the author of books:\u00a0<\/span><a href=\"https:\/\/rubikscode.net\/ultimate-guide-to-machine-learning-with-python\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Ultimate Guide to Machine Learning<\/strong><\/a><span>\u00a0and\u00a0<\/span><a href=\"https:\/\/rubikscode.net\/deep-learning-for-programmers\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>Deep Learning for Programmers<\/strong><\/a><span>. He loves knowledge sharing, and he is an experienced speaker. You can find him speaking at\u00a0<\/span><span>meetups, conferences, and as a guest lecturer at the University of Novi Sad.<\/span><\/p>[\/et_pb_team_member][\/et_pb_column][\/et_pb_row][\/et_pb_section]\n","protected":false},"excerpt":{"rendered":"<p>In this article, we explore several optimization techniques, implement them in Python from scratch and explain how to use them with SciKit Learn.<\/p>\n","protected":false},"author":1,"featured_media":17955,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"_wpas_customize_per_network":false},"categories":[14067,608227693,832],"tags":[608227737,608228088,608228004,608227752,608227751,1131271,608227943,608227990,608228002,608228089,608228001,608228000,608228086,608228087,608228090,608227999,608227994,608227993,608227989,608227675,608228003,608227933],"class_list":["post-15045","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-machine-learning","category-python","tag-artificial-intelligence","tag-bayesian-hyperparameter-optimization","tag-bayesian-optimization","tag-data-science","tag-datascience","tag-deep-learning","tag-from-scratch","tag-gradient-descent","tag-grid-search","tag-hyperparameter-machine-learning","tag-hyperparameter-optimization","tag-hyperparameter-tuning","tag-hyperparameter-tuning-python","tag-hyperparameter-tuning-random-forest","tag-hyperparameter-tuning-sklearn","tag-hyperparameters","tag-machine-learning-optimizers","tag-ml-optimization","tag-optimizers","tag-python","tag-random-search","tag-scikit-learn"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/rubikscode.net\/wp-content\/uploads\/2021\/08\/Featured-31.png","jetpack_shortlink":"https:\/\/wp.me\/p8G8I0-3UF","jetpack_likes_enabled":false,"jetpack_sharing_enabled":true,"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/posts\/15045","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/comments?post=15045"}],"version-history":[{"count":70,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/posts\/15045\/revisions"}],"predecessor-version":[{"id":17954,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/posts\/15045\/revisions\/17954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/media\/17955"}],"wp:attachment":[{"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/media?parent=15045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/categories?post=15045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rubikscode.net\/wp-json\/wp\/v2\/tags?post=15045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}