{"id":147,"date":"2019-10-27T15:06:31","date_gmt":"2019-10-27T15:06:31","guid":{"rendered":"https:\/\/practicalsecurityanalytics.com\/?p=147"},"modified":"2019-11-06T04:56:09","modified_gmt":"2019-11-06T04:56:09","slug":"pe-checksum","status":"publish","type":"post","link":"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/","title":{"rendered":"Threat Hunting with the PE Checksum"},"content":{"rendered":"\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #000000;color:#000000\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #000000;color:#000000\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#What_is_the_PE_Checksum\" >What is the PE Checksum?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#What_is_a_Checksum\" >What is a Checksum?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#How_are_Checksums_Implemented_in_PE_Files\" >How are Checksums Implemented in PE Files?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#Algorithm\" >Algorithm<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#How_does_this_apply_to_intrusion_detection\" >How does this apply to intrusion detection?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/practicalsecurityanalytics.com\/pe-checksum\/#Summary\" >Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_the_PE_Checksum\"><\/span>What is the PE Checksum?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>When the portable executable format was developed, network connections were much less reliable than they are today. It was not uncommon for the integrity of a connection to be compromised and the data being transferred to become corrupted. Additionally, it was difficult for a client to detect whether or not a file was corrupted in transit. This was a significant problem, especially if you were downloading operating system files such as executables and drivers. A one-byte error in a driver could cause an unrecoverable system crash.<\/p>\n\n\n\n<p>As a result, checksums were implemented in the portable executable format with the express intent of being able to detect data corruption and reduce the probability of corrupted code being executed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_a_Checksum\"><\/span>What is a Checksum?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>A checksum is a very basic hashing algorithm that will produce significantly different results even with small changes to the input. It works by calculating the hashing function over the contents of a file to produce the end result. Therefore, files with different contents will have different checksums. This is not always the case as there can be hash collisions where two files with different contents have the same checksum. The probability of a collision is relatively low, so using a checksum helps the majority of the time but is not a guarantee.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_are_Checksums_Implemented_in_PE_Files\"><\/span>How are Checksums Implemented in PE Files?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>In an executable, the header contains a field for the checksum of the file. Typically, the compiler generates the checksum at compile time and writes the value into the checksum field.<\/p>\n\n\n\n<p>According to the PE format specification by Microsoft, the purpose of the checksum field is:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>The image file checksum. The algorithm for computing the checksum is incorporated into IMAGHELP.DLL. The following are checked for validation at load time: all drivers, any DLL loaded at boot time, and any DLL that is loaded into a critical Windows process.<\/p><cite>https:\/\/docs.microsoft.com\/en-us\/windows\/win32\/debug\/pe-format<\/cite><\/blockquote>\n\n\n\n<p>Basically, before loading drivers and certain DLLs, Windows will use a function inside of IMAGHELP.DLL to calculate the checksum of the executable. It will then compare that checksum to the value inside of the PE header. If the two checksums match, then the driver or DLL will be loaded. If not, the Windows loader will assume the file was corrupted and prevent the driver or DLL from being loaded.<\/p>\n\n\n\n<p>One thing that is interesting to note is that executables are not checked for validity using the checksum field, and therefore that field does not have to contain an valid checksum for Windows to run the executable.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1448\" src=\"https:\/\/i2.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?fit=724%2C1024&amp;ssl=1\" alt=\"\" class=\"wp-image-161\" srcset=\"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?w=1024&amp;quality=100&amp;ssl=1 1024w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?resize=212%2C300&amp;quality=100&amp;ssl=1 212w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?resize=768%2C1086&amp;quality=100&amp;ssl=1 768w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?resize=724%2C1024&amp;quality=100&amp;ssl=1 724w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/1024px-Portable_Executable_32_bit_Structure_in_SVG_fixed.svg_.jpg?resize=636%2C900&amp;quality=100&amp;ssl=1 636w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><figcaption><strong>Figure 1:<\/strong> PE Header Format (Source Wikipedia)<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Algorithm\"><\/span>Algorithm<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>While similar to a CRC32 checksum, the algorithm is a little different. It is actually a custom algorithm developed by Microsoft that is not officially published. There are also very few libraries that implement the checksum. I ended up building my own version from a few posts on StackOverflow <a href=\"https:\/\/stackoverflow.com\/questions\/6429779\/can-anyone-define-the-windows-pe-checksum-algorithm\">here<\/a>. The algorithm below is implemented in C++\\CLR, but it gives a good overview of how it works.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>UInt32 GetChecksum(cli::array&lt;Byte>^ data, Int32 checksumOffset) {\n\tpin_ptr&lt;unsigned char> pin = &amp;data[0];\n\tunsigned int* pointer = (unsigned int*)pin;\n\tlong long int checksum = 0;\n\tlong long int top = Utility::Pow((long long int)2, (long long int)32);\n\n\tfor (int i = 0; i &lt; checksumOffset \/ 4; i++) {\n\t\tunsigned int temp = pointer[i];\n\t\tchecksum = (checksum &amp; 0xffffffff) + temp + (checksum >> 32);\n\t\tif (checksum > top) {\n\t\t\tchecksum = (checksum &amp; 0xffffffff) + (checksum >> 32);\n\t\t}\n\t}\n\n\tint stop = data->Length \/ 4;\n\tfor (int i = checksumOffset \/ 4 + 1; i &lt; stop; i++) {\n\t\tunsigned int temp = pointer[i];\n\t\tchecksum = (checksum &amp; 0xffffffff) + temp + (checksum >> 32);\n\t\tif (checksum > top) {\n\t\t\tchecksum = (checksum &amp; 0xffffffff) + (checksum >> 32);\n\t\t}\n\t}\n\n\t\/\/Perform the same calculation on the padded remainder\n\tint remainder = data->Length % 4;\n\tif (remainder != 0) {\n\t\tcli::array&lt;Byte>^ a = gcnew cli::array&lt;Byte>(4);\n\t\tint index = data->Length - remainder;\n\t\tfor (int i = 0; i &lt; 4; i++) {\n\t\t\tif (i &lt; remainder) {\n\t\t\t\ta[i] = data[data->Length - remainder + i];\n\t\t\t}\n\t\t\telse {\n\t\t\t\ta[i] = 0;\n\t\t\t}\n\t\t}\n\t\tpin_ptr&lt;unsigned char> pin2 = &amp;a[0];\n\t\tunsigned int* pointer2 = (unsigned int*)pin2;\n\n\t\tunsigned int temp = pointer2[0];\n\t\tchecksum = (checksum &amp; 0xffffffff) + temp + (checksum >> 32);\n\t\tif (checksum > top) {\n\t\t\tchecksum = (checksum &amp; 0xffffffff) + (checksum >> 32);\n\t\t}\n\t}\n\n\tchecksum = (checksum &amp; 0xffff) + (checksum >> 16);\n\tchecksum = (checksum)+(checksum >> 16);\n\tchecksum = checksum &amp; 0xffff;\n\n\tchecksum += (unsigned int)data->Length;\n\n\treturn (unsigned int)checksum;\n}<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_does_this_apply_to_intrusion_detection\"><\/span>How does this apply to intrusion detection?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>By now, you probably are wondering, &#8220;So what? How does this help me identify and triage malware?&#8221; Well, as it turns out there is a strong correlation between invalid PE Checksums and malware. The graph below illustrates the disparity between malicious and legitimate executables with respect to valid and invalid checksums.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"868\" height=\"709\" src=\"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?resize=868%2C709&#038;quality=100&#038;ssl=1\" alt=\"\" class=\"wp-image-148\" srcset=\"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?w=868&amp;quality=100&amp;ssl=1 868w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?resize=300%2C245&amp;quality=100&amp;ssl=1 300w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?resize=768%2C627&amp;quality=100&amp;ssl=1 768w\" sizes=\"auto, (max-width: 868px) 100vw, 868px\" \/><figcaption><strong>Figure 1:<\/strong> PE Checksum Statistics<\/figcaption><\/figure>\n\n\n\n<p>The graph shows two datasets: good and bad executables. The x-axis shows the two possible results (valid or invalid) of the PE checksum validation, and the y-axis shows the percent of each dataset.<\/p>\n\n\n\n<p>They key take aways from this graph are:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>83% of malware had invalid checksums<\/li><li>90% of legitimate files had valid checksums.<\/li><\/ul>\n\n\n\n<p>As it turns out, the PE header checksum is the single greatest stand-alone indicator of malware we will discuss in this series, even more so than digital signatures (we&#8217;ll talk about why that is in the post about digital signatures).<\/p>\n\n\n\n<p>There are several reasons that the checksum will be invalid: (1) some compilers used by malware authors don&#8217;t support generating the checksum and (2) some authors will modify the executable post compilation which invalidates the checksum. Generally speaking, most malware authors do not go back and update the checksum after the modifications. There is also a lot of incentive for malware authors to modify executables in order to evade detection by AV. Packing, encryption, encoding, and compression can all be used to obfuscate the signaturizable parts of the malware. Many of these tools are designed to work on executables post-compilation so that the malware itself does not need to be rebuilt, and many of these tools do not update the checksum after modifying the executable.<\/p>\n\n\n\n<p>The end result is that 83% of executable malware has invalid PE checksums, which is huge! It is seldom that so many variants of malware share a common suspicious trait. Even with such a ubiquitous feature, OE checksums by themselves will not always land you positive detections.<\/p>\n\n\n\n<p>Let&#8217;s say you are trying to defend a small network where you have found 100,000 unique executables but only one of those is malicious. If you were to try and identify malware based solely on the PE checksums, then you would have approximately 10,000 false positives (10% of all legitimate files have invalid checksums) and 1 true positive. This yields a net false positive rate of 99.99%.<\/p>\n\n\n\n<p>At the same time, you&#8217;ve reduced the amount of hay in your haystack by 90%, making it much easier to find that one needle. Combining the PE checksum with other features will continue to narrow your focus.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Summary\"><\/span>Summary<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>The PE checksum was designed to reduce the probability of data corruption in a DLL or driver leading to crashes in the operating system. The checksum is calculated by the compiler after it builds the executable, and any modifications to the binary post-compilation will invalidate the checksum. Malware authors commonly encode, encrypt, compress, or pack their malware post compilation, but often do not update the checksum. This results in 83% of malware samples possessing invalid PE checksums versus only 10% of legitimate files have invalid checksums.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is the PE Checksum? When the portable executable format was developed, network connections were much less reliable than they are today. It was not uncommon for the integrity of a connection to be compromised and the data being transferred to become corrupted. Additionally, it was difficult for a client to detect whether or not [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":148,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2,5],"tags":[],"class_list":["post-147","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-posts","category-executable-features-series"],"author_meta":{"display_name":"pracsec","author_link":"https:\/\/practicalsecurityanalytics.com\/author\/michael-lester-main\/"},"featured_img":"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?fit=300%2C245&quality=100&ssl=1","jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/pechecksum.png?fit=868%2C709&quality=100&ssl=1","coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/practicalsecurityanalytics.com\/category\/blog-posts\/\" class=\"advgb-post-tax-term\">Blog Posts<\/a>","<a href=\"https:\/\/practicalsecurityanalytics.com\/category\/blog-posts\/executable-features-series\/\" class=\"advgb-post-tax-term\">Executable Features Series<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Blog Posts<\/span>","<span class=\"advgb-post-tax-term\">Executable Features Series<\/span>"]}},"comment_count":"0","relative_dates":{"created":"Posted 6 years ago","modified":"Updated 6 years ago"},"absolute_dates":{"created":"Posted on October 27, 2019","modified":"Updated on November 6, 2019"},"absolute_dates_time":{"created":"Posted on October 27, 2019 3:06 pm","modified":"Updated on November 6, 2019 4:56 am"},"featured_img_caption":"","series_order":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pbnFRW-2n","_links":{"self":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/147","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/comments?post=147"}],"version-history":[{"count":5,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/147\/revisions"}],"predecessor-version":[{"id":272,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/147\/revisions\/272"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/media\/148"}],"wp:attachment":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/media?parent=147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/categories?post=147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/tags?post=147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}