{"id":47,"date":"2019-10-17T21:47:29","date_gmt":"2019-10-17T21:47:29","guid":{"rendered":"http:\/\/box5854.temp.domains\/~practkx5\/?p=47"},"modified":"2022-01-21T22:54:21","modified_gmt":"2022-01-21T22:54:21","slug":"file-entropy","status":"publish","type":"post","link":"https:\/\/practicalsecurityanalytics.com\/file-entropy\/","title":{"rendered":"Threat Hunting with File Entropy"},"content":{"rendered":"\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #000000;color:#000000\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #000000;color:#000000\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#What_is_Entropy\" >What is Entropy?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#How_does_this_apply_to_intrusion_detection\" >How does this apply to intrusion detection?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#Calculating_Entropy_with_Sigcheck\" >Calculating Entropy with Sigcheck<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#Top_Packers\" >Top Packers<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#False_Positives\" >False Positives<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#Techniques_for_Reducing_False_Positives\" >Techniques for Reducing False Positives<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#Adversary_Techniques_for_Reducing_Entropy\" >Adversary Techniques for Reducing Entropy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/practicalsecurityanalytics.com\/file-entropy\/#Conclusion\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"What_is_Entropy\"><\/span>What is Entropy?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Entropy is a measure of randomness within a set of data. When referenced in the context of information theory and cybersecurity, most people are referring to Shannon Entropy. This is a specific algorithm that returns a value between 0 and 8 were values near 8 indicate that the data is very random, while values near 0 indicate that the data is very homodulous.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"How_does_this_apply_to_intrusion_detection\"><\/span>How does this apply to intrusion detection?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Shannon entropy can be a good indicator for detecting the use of packing, compression, and encryption in a file. Each of the previously mentioned techniques tends to increase the overall entropy of a file. This makes sense intuitively. Let&#8217;s take compression for example. Compression algorithms reduce the size of certain types of data by replacing duplicated parts with references to a single instance of that part. The end result is a file with less duplicated contents. The less duplication there is in a file, the higher the entropy will be because the data is less predictable than it was before.<\/p>\n\n\n\n<p>As it turns out, malware authors also tend to rely heavily on packing, compression, and encryption to obfuscate their tools on order to evade signature based detection systems.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"709\" src=\"https:\/\/i0.wp.com\/box5854.temp.domains\/~practkx5\/wp-content\/uploads\/2019\/10\/Picture1.png?resize=856%2C709&#038;quality=100\" alt=\"\" class=\"wp-image-66\" srcset=\"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/Picture1.png?w=856&amp;quality=100&amp;ssl=1 856w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/Picture1.png?resize=300%2C248&amp;quality=100&amp;ssl=1 300w, https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/Picture1.png?resize=768%2C636&amp;quality=100&amp;ssl=1 768w\" sizes=\"auto, (max-width: 856px) 100vw, 856px\" \/><figcaption><strong>Figure 1:<\/strong> Histogram of entropy of legitimate versus malicious files.<\/figcaption><\/figure><\/div>\n\n\n\n<p>The data in figure one was derived from a set of approximately 500K legitimate and malicious 32-bit and 64-bit portable executable files. The malicious files came primarily from VirusShare, Malwr, dasmalwerk.eu, CAPE Sandbox, and Contagio. The legitimate files came from scrapes of production Windows 7 and Windows 10 systems. This means that the data may not be as encompassing as what you might get from VirusTotal, who has a lot more samples, and it is only specific to compiled PE files. The data is also predominantly from VirusShare, and is therefor specific to only their collection sources, but some basic trends can be identified.<\/p>\n\n\n\n<p>A few things stand out in this graph:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Legitimate files tend to have an entropy between 4.8 and 7.2.<\/li><li>Files with an entropy above 7.2 tend to be malicious.<\/li><li>Nearly 30% of all of the malicious samples have an entropy near 8.0 while only 1% of legitimate samples have an entropy near 8.0.<\/li><li>Approximately 55% of all malicious samples have a entropy of 7.2 or more versus 8% of legitimate samples.<\/li><\/ol>\n\n\n\n<p>From this chart, you can see that entropy is a strong feature for distinguishing between legitimate and malicious files. If an adversary has used some form of compression, packing, or encryption then it is likely to change the entropy of the file which will stand out with this type of analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Calculating_Entropy_with_Sigcheck\"><\/span>Calculating Entropy with Sigcheck<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>If you are looking at a single file, then sigcheck by Sysinternals can be used to calculate the entropy of the file using the command below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>D:\\Malware\\&gt; sigcheck.exe -h -a \"D:\\Malware\\11\"\n\nigcheck v2.54 - File version and signature viewer\nopyright (C) 2004-2016 Mark Russinovich\nysinternals - www.sysinternals.com\n\n:\\malware\\11:\n       Verified:       Unsigned\n       Link date:      2:02 PM 10\/15\/2013\n       Publisher:      n\/a\n       Company:        n\/a\n       Description:    n\/a\n       Product:        n\/a\n       Prod version:   n\/a\n       File version:   n\/a\n       MachineType:    32-bit\n       Binary Version: n\/a\n       Original Name:  n\/a\n       Internal Name:  n\/a\n       Copyright:      n\/a\n       Comments:       n\/a\n       Entropy:        7.997\n       MD5:    000A2E8EB96F3AF556E3299541B03F00\n       SHA1:   3AB630A357F05EDA98CC6DAC06BE79815735216D\n       PESHA1: 610B2B33E1F7840FE5E4B1ADC2E9FEDD1D5E26E2\n       PE256:  D845C8A92CEF8726B952EAAE53F3768471D6EA0EDF7CDE11D0453429A820C929\n       SHA256: 0E40E014381E3F70054B41BA24EFDF86CCA272CFD8A66566B0662AC29A57FF7D\n       IMP:    BF5AB190F10D097C8183FD4D65042281<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Top_Packers\"><\/span>Top Packers<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<table class=\"wp-block-table is-style-regular\"><tbody><tr><td><strong>Name<\/strong><\/td><td><strong>Blacklist<\/strong><\/td><td><strong>Whitelist<\/strong><\/td><\/tr><tr><td>Yodas Protector<\/td><td>6.69%<\/td><td>0.11%<\/td><\/tr><tr><td>Ultimate Packer for Executables (UPX)<\/td><td>3.55%<\/td><td>0.09%<\/td><\/tr><tr><td>Armadillo<\/td><td>3.26%<\/td><td>2.32%<\/td><\/tr><caption><strong>Figure 2:<\/strong> Top Packers<\/caption><\/tbody><\/table>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"False_Positives\"><\/span>False Positives<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Many application authors do not use encryption, packing, compression, or encoding on the binaries, but there are some legitimate reasons to use those techniques. For example, some companies will use encryption or obfuscation techniques on their software in order to make it more difficult to reverse engineer and thus protect their intellectual property.<\/p>\n\n\n\n<p>Other application authors will use compression to reduce the overall size of their binaries to reduce download times for their products. The data in Figure 2 shows that 2.3% of all legitimate files in this dataset are packed with the Armadillo packer. With such a high incidence of false positives, we are going to need to dig deeper in order to effectively triage an unknown binary.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Techniques_for_Reducing_False_Positives\"><\/span>Techniques for Reducing False Positives<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Unfortunately, entropy is such a strong feature that it sometimes becomes the <em>only<\/em> distinguishing factor between legitimate and malicious files, which as we just discussed can cause false positives. This is where it becomes necessary to rely on other features.<\/p>\n\n\n\n<p>When attempting to triage a sample that has a high entropy, a good next step is to run PEID signatures against it in order to determine what packing algorithm or software may have been used. The <a class=\"clink\" href=\"https:\/\/practicalsecurityanalytics.com\/malware-analysis-center\/\">Malware Analysis Center<\/a> will automatically do this for all samples submitted to it. As seen in Figure 2, different algorithms have different probabilities of being legitimate or malicious. For example, the Armadillo packer is much more likely to be a false positive then the Yodas Protector packer.<\/p>\n\n\n\n<p>If there is a high level of entropy in the file, and no PEID signature fires, then that is particularly suspicious. Given that information, I would most likely open up an investigation on that finding and move on to automated dynamic analysis using Cuckoo, FireEye, Wildfire, or whatever automated sandbox I have access to at the time. The reason for this is because most legitimate software will use a well-known packer versus developing their own because there is little to begin from having your own custom packer when writing legitimate software. This is not the case with malware where having your own custom packer can prevent antivirus engines from being able to unpack your code and run signatures against it.<\/p>\n\n\n\n<p>If a well known packing algorithm signature fires, then the next step is to attempt to unpack that software in order to see what&#8217;s inside. Unpacking services such as <a class=\"clink\" href=\"https:\/\/www.unpac.me\/about#\/\">UnpacMe <\/a>can be used to extract packed binaries for further analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Adversary_Techniques_for_Reducing_Entropy\"><\/span>Adversary Techniques for Reducing Entropy<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>There are several ways to reduce the entropy of a file in order to make that file seem more legitimate. One easy way is to use single-byte XOR encoding. While not being a very strong cipher algorithm, it has the unique advantage of not changing the overall entropy of the file. This is one type of encryption that entropy analysis will not help you with.<\/p>\n\n\n\n<p>Another technique for reducing entropy is reduce the amount of data encrypted relative to the overall size of the file. There are two ways to do this: (1) reduce the amount of data encrypted or (2) increase the amount of non-encrypted data. The purpose of encryption is to prevent AV vendors from flagging on signatures. Encrypting the whole file will look suspicious from an entropy perspective, but that may not be necessary. Only the signaturizable parts of the malware really need to be encrypted. If only 10% of the file is encrypted, than that section has a much lower impact on the overall entropy of the file.<\/p>\n\n\n\n<p>The other technique is to add normal, legitimate data to the file. One way of doing this is by statically compiling legitimate code into the executable that will not be needed. This is effective at reducing entropy, but also for tricking AI based Antivirus. Researches at Skylight Cyber were able to trick the AI based Cylance Protect system by adding strings from legitimate applications into malicious files in order to evade detection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Conclusion\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Entropy is a strong indicator of packing, encryption, and compression which are all techniques commonly used by malware. Nearly 50% of all malware samples have an entropy of 7.2 or greater. Like all features, entropy does not tell the whole story. There are legitimate reasons to use packing, encryption, and compression on binaries, so you must be able to dig a little deeper  once you have identified a sample of interest with a high entropy. Packing signatures followed by automated dynamic sandboxing and unpacking applications can help you take that next step.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Using entropy as a feature to find malware.<\/p>\n","protected":false},"author":2,"featured_media":66,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"advgb_blocks_editor_width":"","advgb_blocks_columns_visual_guide":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[2,5],"tags":[],"class_list":["post-47","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-posts","category-executable-features-series"],"author_meta":{"display_name":"pracsec","author_link":"https:\/\/practicalsecurityanalytics.com\/author\/michael-lester-main\/"},"featured_img":"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/Picture1.png?fit=300%2C248&quality=100&ssl=1","jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/practicalsecurityanalytics.com\/wp-content\/uploads\/2019\/10\/Picture1.png?fit=856%2C709&quality=100&ssl=1","coauthors":[],"tax_additional":{"categories":{"linked":["<a href=\"https:\/\/practicalsecurityanalytics.com\/category\/blog-posts\/\" class=\"advgb-post-tax-term\">Blog Posts<\/a>","<a href=\"https:\/\/practicalsecurityanalytics.com\/category\/blog-posts\/executable-features-series\/\" class=\"advgb-post-tax-term\">Executable Features Series<\/a>"],"unlinked":["<span class=\"advgb-post-tax-term\">Blog Posts<\/span>","<span class=\"advgb-post-tax-term\">Executable Features Series<\/span>"]}},"comment_count":"6","relative_dates":{"created":"Posted 6 years ago","modified":"Updated 4 years ago"},"absolute_dates":{"created":"Posted on October 17, 2019","modified":"Updated on January 21, 2022"},"absolute_dates_time":{"created":"Posted on October 17, 2019 9:47 pm","modified":"Updated on January 21, 2022 10:54 pm"},"featured_img_caption":"","series_order":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pbnFRW-L","_links":{"self":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/47","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":5,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"predecessor-version":[{"id":585,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/posts\/47\/revisions\/585"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/media\/66"}],"wp:attachment":[{"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/practicalsecurityanalytics.com\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}