<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.5">Jekyll</generator><link href="https://badhomb.re//feed.xml" rel="self" type="application/atom+xml" /><link href="https://badhomb.re//" rel="alternate" type="text/html" /><updated>2020-10-06T19:43:44-04:00</updated><id>https://badhomb.re//feed.xml</id><title type="html">Santiago Torres-Arias</title><subtitle>Santiago Torres-Arias's personal website</subtitle><entry><title type="html">How to easily try out TUF + in-toto</title><link href="https://badhomb.re//ci/security/2020/05/01/tuf-in-toto.html" rel="alternate" type="text/html" title="How to easily try out TUF + in-toto" /><published>2020-05-01T17:00:00-04:00</published><updated>2020-05-01T17:00:00-04:00</updated><id>https://badhomb.re//ci/security/2020/05/01/tuf-in-toto</id><content type="html" xml:base="https://badhomb.re//ci/security/2020/05/01/tuf-in-toto.html">&lt;p&gt;I’ve been speaking quite a lot with quite a lot of people about the benfits of
in-toto and TUF together. Indeed, my reaction after saying “hey, you don’t
&lt;em&gt;need&lt;/em&gt; TUF to use in-toto”, is “but they do go really well together”.
I’ve done it so much that by now I have a very well rehearsed canned answer as
to why they go well. It was only a matter of will and free time (look ma! I’m a
Doctor now!), before I decided to dust off this blog and probably share why it
matters and — more importantly — how you can see it for yourself in four
easy steps.&lt;/p&gt;

&lt;h2 id=&quot;what-is-tuf-and-in-toto-and-how-they-are-different&quot;&gt;What is TUF and in-toto and how they are different?&lt;/h2&gt;

&lt;p&gt;So, as I said, I generally start my engagements with people asserting TUF !=
in-toto, and the reason why is because they are generally conflated together
because they come from similar teams and follow similar design principles.
However, they serve the same overarching goal: &lt;em&gt;secure delivery of content&lt;/em&gt; but
they provide different &lt;em&gt;security properties&lt;/em&gt;.  So, to make things super clear,
they are not the same — they complement each other.&lt;/p&gt;

&lt;h3 id=&quot;what-is-tuf-tuf-stores-stuff-and-does-other-stuff-as-well&quot;&gt;What is TUF? TUF stores sTUFf (and does other stuff as well)&lt;/h3&gt;

&lt;p&gt;TUF started as an Update Framework, but as noted by a lot of people, it is
actually a very neat way to provide trust information about arbitrary
collections of software elements (anything you can hash, really). We oftentime
refer to these as &lt;em&gt;software artifacts&lt;/em&gt;. However, TUF can &lt;em&gt;also&lt;/em&gt; provide trust
information about other metainformation about these artifacts. Think of TUF as
some sort of very powerful mechanism to store sTUFf securely.&lt;/p&gt;

&lt;p&gt;Not only TUF stores sTUFf, but it also allows you to make sure these artifacts
actually came from whoever should’ve put them there. Say, that you trust your
friend Eliza who owns a pharmacy to give you a bottle of pills. With TUF (and
if we were able to hash bottles of pills), you could make sure that this bottle
of pills was put in the counter (i.e., a repository) by Eliza and Eliza only.
So, in other words, TUF ensures &lt;em&gt;authenticity of the provider of the data it’s
storing&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;If we were talking in person I would’ve eluded to my fantastic car salesman
voice, but given that you’re reading it you’ll have to do the heavy lifting,
because there’s more!&lt;/p&gt;

&lt;p&gt;TUF also ensures other very important things, like the software artifacts being
fresh. That is, that your bottle of pills from the previous example is not
actually an old one.&lt;/p&gt;

&lt;p&gt;Lastly, and very importantly, it also ensures that the repository where all
these software artifacts are located follows a consistent state. This is
important, because as it has been noted before, attackers can &lt;em&gt;mix and match&lt;/em&gt;
software artifacts in such a way that the sum of their parts is actually
malicious. This would be akin to having somebody using different versions of
Eliza’s pharmaceutical offerings and tricking you into taking two very
incompatible chemicals&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;  — technically, you git it all from Eliza didn’t
you?&lt;/p&gt;

&lt;p&gt;So to wrap this up. Imagine TUF being this system that ensures that you:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Got what you wanted&lt;/li&gt;
  &lt;li&gt;From who’s supposed to put it there&lt;/li&gt;
  &lt;li&gt;That it’s not expired or old&lt;/li&gt;
  &lt;li&gt;And that there is a consistent state of the place you got them to avoid many
things&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;(Now, TUF does other neat things, but these are the big selling points in my
opinion)&lt;/p&gt;

&lt;p&gt;Great, so I’ve told you about TUF, let me go ahead and introduce its sister:
in-toto.&lt;/p&gt;

&lt;h3 id=&quot;in-toto-answers-how-eliza-got-her-stuff&quot;&gt;in-toto answers how Eliza got her sTUFf&lt;/h3&gt;

&lt;p&gt;Now, a crucial question that you may ask about Eliza’s pharmacy is: “well I
trust Eliza, and she hopefully is selilng me things that were FDA approved and
whatnot”. And the truth is, unfortunately, you don’t know. In the world of
software as the stakes are, you either walk back yourself or hope that people
aren’t lying about what they put in their software repositories. In other
words, in-toto allows you to do cool things like put FDA approval stamps on
Eliza’s pills.&lt;/p&gt;

&lt;p&gt;To do this, in-toto creates a cryptographic paper trail that’s very akin to
Bills of Materials (in fact, in-toto is very related to cryptographically
enforce-able bills of materials), so that you can walk a
&lt;strong&gt;cryptographically-authenticated paper trail&lt;/strong&gt; from your bottle of pills (err,
software artifact), all the way to the raw materials that created it (e.g.,
think of source code, configuration files, etc)&lt;/p&gt;

&lt;p&gt;This way, you have a very strong coallition of products, one that ensures
everything is very tighly sealed and packaged (TUF), and one that gives you
cryptographic visibility on the process that produced what you just got
(in-toto).&lt;/p&gt;

&lt;h2 id=&quot;tuf-and-in-toto-together&quot;&gt;TUF and in-toto together&lt;/h2&gt;

&lt;p&gt;The basic idea is simple, we will use in-toto in the pipeline to create the
paper trail, and then we will use TUF to store all the sTUFf. We will do this
in basically four steps, and here they are:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Initialize a TUF repository&lt;/li&gt;
  &lt;li&gt;Create an in-toto layout and register it in a special place in the TUF
repository&lt;/li&gt;
  &lt;li&gt;Carry out your pipeline as you normally would, but create in-toto
attestations that are submitted to a TUF repository&lt;/li&gt;
  &lt;li&gt;Profit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I took the liberty of gathering a bunch of demos that both the in-toto and TUF
communities put together through countless hours and stitched them together to
create this four-step process to profit.  You can get it from
&lt;a href=&quot;https://github.com/SantiagoTorres/tuf_in_toto_demo&quot;&gt;here&lt;/a&gt;, you’ll see it has a
bunch of submodules so make sure you’re cloning recursively (with the -r flag)&lt;/p&gt;

&lt;p&gt;So let’s go and do it!&lt;/p&gt;

&lt;h3 id=&quot;0-set-up-your-environment&quot;&gt;0. Set up your environment&lt;/h3&gt;

&lt;p&gt;Ok, I lied. it’s five steps. The first one is to install TUF and in-toto. You
can probably use a virtualenv and install them from pip:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ pip install tuf in-toto pynacl cryptography
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Done. I also threw in the pynacl dependencies and cryptography so you can use
any keys you like — this deal won’t last forever!&lt;/p&gt;

&lt;h3 id=&quot;1-initialize-the-tuf-repository&quot;&gt;1. Initialize the tuf repository&lt;/h3&gt;

&lt;p&gt;So, you can grok the script under scripts/init.py, or you can blindly execute
my code (tough choice, I know). Either way, once you run it or read it you’ll
find out it basically does the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Sets up a place to store your repository metadata (think of it as git
init-ing things)&lt;/li&gt;
  &lt;li&gt;Goes ahead and sets up the trust relationships (i.e., it says, I trust this
key for packages, this one for layouts and links).&lt;/li&gt;
  &lt;li&gt;Sets up your client environment (i.e., copies the root of trust into the
client directory)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Yes, with 50 lines of code and you have your own shiny TUF/in-toto repo. Order
today!&lt;/p&gt;

&lt;h3 id=&quot;2-creating-and-publishing-a-layout&quot;&gt;2. Creating and publishing a layout&lt;/h3&gt;

&lt;p&gt;I tried to reduce code duplication around, so I copied the layout from the
basic &lt;a href=&quot;https://github.com/in-toto/demo&quot;&gt;in-toto demo&lt;/a&gt;, which basically starts
from a git repository and then finally creates a tarball called
“demo-package.tar.gz”. This is our bottle of pills.&lt;/p&gt;

&lt;p&gt;However, before we make all this, we want to create an in-toto layout, which is
a policy file which will describe how our supply chain should actually look
like. This time, run the script to publish this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ python scripts/publish.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will do the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Create this layout and sign it with the root of trust&lt;/li&gt;
  &lt;li&gt;Add it to a special location in the TUF repository&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Done, a couple of lines more and now you have a TUF repository publishing
in-toto layouts. We will use these layouts later when we want to make sure our
pipeline was followed to the letter.&lt;/p&gt;

&lt;p&gt;Now that you have something published, it may be a good time to serve the
content using the accompanying script:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ bash create_server.sh 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In another terminal so you can see your TUF repo in action.&lt;/p&gt;

&lt;h3 id=&quot;3-carry-our-the-pipeline&quot;&gt;3. Carry our the pipeline&lt;/h3&gt;

&lt;p&gt;Now, we do our usual stuff, code some, pre-commit some stuff, package it and
put it somewhere so people can download it. So let’s do just that, but use
in-toto tooling to create cryptographic attestations of what happened so we can
create an audit trail. This was shamelessly copied as well from the in-toto
demo modulo a small wrapper to &lt;em&gt;also&lt;/em&gt; put these attestations in our TUF
repository. You can run:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ python scripts/run.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And see it all happen with your own eyes.&lt;/p&gt;

&lt;h3 id=&quot;4-now-verify-its-all-together&quot;&gt;4. Now verify it’s all together&lt;/h3&gt;

&lt;p&gt;At last, it’s time to consume our package. To do this, we will download things,
then make sure they are kosher and finally open it to unwrap what’s inside.
This is very similar to how things in the meatspace work. Let’s think about our
bottle of pills again:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;You your new bottle of pills from your medicine cabinet&lt;/li&gt;
  &lt;li&gt;You ensure that the tamperproof seal around it is nice, and that the
expiration dates are correct (TUF)&lt;/li&gt;
  &lt;li&gt;If you want to be extra sure, you will also make sure there’s an FDA
approval seal and notice a lot-number besides the expiration date (in-toto)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once you notice all these things, then you go ahead and open the bottle.&lt;/p&gt;

&lt;p&gt;Software shouldn’t be too different. In fact, this is what our downloader will
do. In something shy of 30 lines, it will first use TUF to connect to our
repository and download our package, then it will notice there is also other
information that’s attached to it and download it too. Once it’s all downloaded 
it will go ahead and run in-toto verification on it. If everything is ok, then
you will happily consume your new package:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ tar xvf demo-package.tar.gz &amp;amp;&amp;amp; pyton demo-package/foo.py
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That’s it!&lt;/p&gt;

&lt;h2 id=&quot;where-do-we-go-from-here&quot;&gt;Where do we go from here?&lt;/h2&gt;

&lt;p&gt;There are a lot of resources to work with in-toto and TUF. I usually put things into two buckets. You can do things &lt;em&gt;with&lt;/em&gt; TUF and in-toto or you could do things &lt;em&gt;for&lt;/em&gt; TUF and in-toto.&lt;/p&gt;

&lt;p&gt;For the first, you can set up a repository and play with things more by
tweaking these scripts. You can also take a look at the documentation and
explore ways to add these tools to your environments and ecosystems. Shoot an
email to the &lt;a href=&quot;theupdateframework@googlegroups.com&quot;&gt;TUF&lt;/a&gt; or
&lt;a href=&quot;in-toto-public@googlegroups.com&quot;&gt;in-toto&lt;/a&gt; lists if you ever run into issues as
we’d be more than hapy to help&lt;/p&gt;

&lt;p&gt;If you are interested in also developing &lt;em&gt;for&lt;/em&gt; in-toto or TUF, these
communities are super, super welcoming, and I’d encourage you to reach out and
play with things.  There are already some labels for newcomers in the issues. I
think there are places to work in which we could use more hands:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;We have a &lt;a href=&quot;https://github.com/in-toto/in-toto-golang&quot;&gt;golang library&lt;/a&gt; that
needs more love: it’s a somewhat feature-complete implementation but it
could use more work on the non core in-toto features. If you like go, I
think this is a great place to leave a mark.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://github.com/jenkinsci/in-toto-plugin&quot;&gt;Jenkins Plugin&lt;/a&gt; and the
&lt;a href=&quot;https://github.com/in-toto/in-toto-webhook&quot;&gt;Kubernetes Admission controller&lt;/a&gt;
are also great places to work with things. I’d suggest you take a
look at the repositories and play with them a little bit.&lt;/li&gt;
  &lt;li&gt;Anything you’d like really. If you have ideas on how to make things better
I’m sure that we’d be more than happy to hear them.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I hope this all helps, and see you around!&lt;/p&gt;

&lt;h2 id=&quot;acknowledgements&quot;&gt;Acknowledgements&lt;/h2&gt;

&lt;p&gt;So, this is nothing new, and actually you can read how Datadog does this very
well
&lt;a href=&quot;https://www.datadoghq.com/blog/engineering/secure-publication-of-datadog-agent-integrations-with-tuf-and-in-toto/&quot;&gt;here&lt;/a&gt;.
This blogpost and demo is of course inspired on the work by Trishank Kuppusamy,
who made it all happen on the Datadog side of things. You may also want to read how this is getting encoded into an in-toto ITE &lt;a href=&quot;https://github.com/in-toto/ITE/pulls/4&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;I’m not a chemist and I’m not going to give you any bad ideas so the example chemicals are left as an exercise to the reader. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html">I’ve been speaking quite a lot with quite a lot of people about the benfits of in-toto and TUF together. Indeed, my reaction after saying “hey, you don’t need TUF to use in-toto”, is “but they do go really well together”. I’ve done it so much that by now I have a very well rehearsed canned answer as to why they go well. It was only a matter of will and free time (look ma! I’m a Doctor now!), before I decided to dust off this blog and probably share why it matters and — more importantly — how you can see it for yourself in four easy steps.</summary></entry><entry><title type="html">Creating a web-enabled USB drived with WebUSB</title><link href="https://badhomb.re//webusb/u2f/2fa/2017/11/29/webusb.html" rel="alternate" type="text/html" title="Creating a web-enabled USB drived with WebUSB" /><published>2017-11-29T20:00:00-05:00</published><updated>2017-11-29T20:00:00-05:00</updated><id>https://badhomb.re//webusb/u2f/2fa/2017/11/29/webusb</id><content type="html" xml:base="https://badhomb.re//webusb/u2f/2fa/2017/11/29/webusb.html">&lt;p&gt;I got caught in the crossfire of adapting one of my projects
(PolyPasswordHasher, if you’re curious) to support two factor authentication
recently. One of the goals that I had prepared for the summer was to have an
actual demo website in which someone could register a yubikey and log in to a
website using PPH + HOTP (I’ll leave the reason as to why HOTP out of this
post) without too much hassle.&lt;/p&gt;

&lt;p&gt;Sadly, the ecosystem for browser USB extensions feels like a wasteland:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;You could write a plugin, but that’s incredibly insecure and close to being
deprecated in one or two years.&lt;/li&gt;
  &lt;li&gt;You could use chrome’s USB extension library but, guess what, it’s also going
to be deprecated.&lt;/li&gt;
  &lt;li&gt;You can also try to ship a binary with a browser extension but that would
start more cross-platform compatility problems than I’d like to list here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leaves out to a somehwat experimental technology: webUSB.&lt;/p&gt;

&lt;h2 id=&quot;enter-webusb&quot;&gt;Enter WebUSB&lt;/h2&gt;

&lt;p&gt;WebUSB is a (finally!) standarized technology to provide a USB bridge so
websites can connect to users’ USB devices using JavaScript. You can look at it
as if the website was providing you with a USB driver along the two tonnes of
JQuery it uses to make rounded boxes in your site.&lt;/p&gt;

&lt;p&gt;This may sound like a security nightmare, at least during its first impression.
Shipping code that has access to the user’s hardware sounds somewhat
problematic. However, webUSB is an improvement security-wise if you consider
the previous alternatives:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;It’s not code running outside of a sandbox, like a flashplugin would be.&lt;/li&gt;
  &lt;li&gt;Permissions must be granted by the user to allow a website to access a usb
device &lt;em&gt;explicitly&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;Some devices, like USB keyboards, are not accessible to webusb (e.g., to
avoid keylogging)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That being said, I wouldn’t be surprised if someone finds a way to abuse it
during these early stages.&lt;/p&gt;

&lt;p&gt;Besides the security aspects of webusb, the only drawback that I found is that,
well, there is not much documentation on how to write a webusb device handler.
Here, I’ll document how I ‘reversed’ (I could’ve just read the code for their
open-source tools, but that’s not fun) and wrote a webusb driver for a yubikey
with HOTP enabled.&lt;/p&gt;

&lt;h2 id=&quot;setting-up-your-dev-environment&quot;&gt;Setting up your dev environment&lt;/h2&gt;

&lt;p&gt;In order to develop for webusb, you need to move a couple of things around.
First, you need the latest(-ish?) version of chromium. Second, you need to run
it with a couple of flags and a local webserver to serve your webusb JavaScript
flies. Third, you may need to enable a couple of flags within chromium to
enable experimental features.&lt;/p&gt;

&lt;p&gt;Start chromium like this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ chromium --disable-web-security --allow-insecure-localhost
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The reason as to why is that we’ll be serving the files using a plain http
server from python (you can use whatever makes you happy here though). By
default, webusb is not enabled if the content is not served through HTTPS and
the certificate is trusted (++ for security here). A complete list of flags can
be taken from 
&lt;a href=&quot;https://peter.sh/experiments/chromium-command-line-switches/%22%3Ethis&quot;&gt;this site&lt;/a&gt; 
in case you’re curious, although you don’t need more than these two.&lt;/p&gt;

&lt;p&gt;Finally, depending on how old your version of chromium is, you may need to
enable the experimental features by navigating to chrome://flags and enabling a
flag called “Experimental web platform features.” If you have done this, then
you will need to restart your browser.&lt;/p&gt;

&lt;p&gt;After setting up chromium, you can start serving your local WebUSB files like
so:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ python3 -m http.server
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Cool! Now you should be able to navigate to localhost and play around with
webusb and your device.&lt;/p&gt;

&lt;h2 id=&quot;sniffing-the-usb-device&quot;&gt;Sniffing the USB device&lt;/h2&gt;

&lt;p&gt;Another necessary task is to understand what the original usb driver is sending
to the USB device in order to replicate it. Although you may want to write
something that already has an implementation using other libraries (e.g.,
libusb), or a specification describing these tasks, you may run into devices
that are not documented (again, this wasn’t the case of the yubikey device, but
I still opted for not checking the docs). If you’re under the third case and
there are no documents on how to interact with your device, a simple pcap using
wireshark can work wonders.&lt;/p&gt;

&lt;h3 id=&quot;setting-up-wireshark-for-usb-sniffing&quot;&gt;Setting up wireshark for USB sniffing&lt;/h3&gt;

&lt;p&gt;Under linux, wireshark needs to have a couple of modules loaded and permissions
changed so you can sniff usb traffic. The instructions are taken from 
&lt;a href=&quot;https://wiki.wireshark.org/CaptureSetup/USB%22%3Ethis%3C/a%3E&quot;&gt;this article&lt;/a&gt;, 
but I’ll inline the linux instructions anyway.&lt;/p&gt;

&lt;p&gt;First, load the usbmon kernel module:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo modprobe usbmon
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This will create a series of /dev/usbmonN devices. You need to make them
readable by regular users:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ sudo setfacl -m u:USER:r /dev/usbmon*
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Having done this, you can launch wireshark and pick up an inerface to sniff.
Which one to pick can be easily seen using dmesg. Launch dmesg on follow and
then plug in your device. You should see something like this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[ 6350.949823] usb 1-4: new full-speed USB device number 9 using xhci_hcd 
[ 6351.093360] input: Yubico Yubikey NEO OTP+CCID as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.0/0003:1050:0111.0006/input/input22 
[ 6351.150902] hid-generic 0003:1050:0111.0006: input,hidraw0: USB HID v1.10 Keyboard [Yubico Yubikey NEO OTP+CCID] on usb-0000:00:14.0-4/input0 
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The important bits of this part of the log is the usb 1-4 part. This means it
was connected to the usbmon1 interface. you can also know what “address”
Wireshark will use from the information on the rest of the line (1.9.x). A
sample wireshark capture of a packet going to our USB device would look like
this:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;398 19.946717 host 1.9.0 USB 72 URB_CONTROL out
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This was a packet sent from your computer into 1.9.0, the device that was just
connected. Using Wireshark, we can capture the “conversation” made between the
laptop and the Yubikey (or any other device) and sort-of, tell what’s exactly
being sent and received.&lt;/p&gt;

&lt;p&gt;In this case the host sends a series of URB_CONTROL_OUT message(s) with certain
flags and the challenge to hash, then waits for a status flag to be set on the
replies and starts reading the resulting HOTP hash. You can see the relevant
bits of the conversation on packets 8 to 47 in 
&lt;a href=&quot;/https://ptpb.pw/0Pnn.pcapng&quot;&gt;this pcap&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;translating-sniffed-packets-into-webusb-calls&quot;&gt;Translating Sniffed packets into webusb calls&lt;/h2&gt;

&lt;p&gt;Now that we know what we need to do, we can try to replicate the behavior using
webusb to interact with our devices. For example, the details of the packet I
listed above are as follows:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/pcap.png&quot; alt=&quot;pcap&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This can be translated into the following webusb call:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Device.controlTransferOut({
    &quot;recipient&quot;: &quot;interface&quot;,
    &quot;requestType&quot;: &quot;class&quot;,
    &quot;request&quot;: 9,
    &quot;value&quot;: 0x0300,
    &quot;index&quot;: 0 }, Data);
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You may suspect that some of the values on the wireshark scan are the same as
the arugments sent to the control transfer out. Well, it is that simple. If you
don’t want to understand what these values mean (I certainly won’t cover them
here), you could just blindly build the same request and see how the device
behaves.&lt;/p&gt;

&lt;p&gt;These calls return a promise object, which resolves with the data that the
device contains after our call. We would have to chain these promises to
effectively have a conversation with our yubikey. However, this may not be as
straightforward as with other approaches.&lt;/p&gt;

&lt;h3 id=&quot;fun-with-promises&quot;&gt;Fun with promises&lt;/h3&gt;

&lt;p&gt;The webusb API is reliannce on promises, makes writing driver-like code a
little weird. This is because webusb is merging two worlds: one with the weird
JavaScript “asynchronicity” on the web-space and the structured-protocol,
raw-byte-handling world of low-level device interaction. This construction will
often lead to a design pattern: nested-promises. At least in my very humble
opinion.&lt;/p&gt;

&lt;p&gt;A nested webusb promise, in simple terms is something that does the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Start a promise by sending a request. The result will be handled by another promise&lt;/li&gt;
  &lt;li&gt;The second promise will check whether the request is read (i.e., the read
frame says “good to go”):
    &lt;ul&gt;
      &lt;li&gt;If it’s not ready, start another promise exactly like the one in step 2.&lt;/li&gt;
      &lt;li&gt;If it is ready, then move on and resolve the “outer” promise so we can
continue onwards to the next step.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This may be easier to picture in the diagram below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/assets/images/promise-chain-1.png&quot; alt=&quot;promise chain&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This construction makes it so that the outer promise can construct a promise
chain that follows a structured protocol such as the one used by the Yubikey
HOTP interface. In contrast, the inner promise chain will make the outer
promises hold on until the device is ready for the next step.&lt;/p&gt;

&lt;p&gt;This way, we can write a usb device handler that looks pretty much like the
drivers you would write with libusb, but using the async and pretty
JavaScript-y API of webusb.&lt;/p&gt;

&lt;p&gt;Writing webusb handlers/drivers is rather fun and easy once you get started.
Reversing usb devices is a fun side-project that may keep you interested/busy
for a good week while learning a little about the things we plug into our
computers every day.&lt;/p&gt;</content><author><name></name></author><summary type="html">I got caught in the crossfire of adapting one of my projects (PolyPasswordHasher, if you’re curious) to support two factor authentication recently. One of the goals that I had prepared for the summer was to have an actual demo website in which someone could register a yubikey and log in to a website using PPH + HOTP (I’ll leave the reason as to why HOTP out of this post) without too much hassle.</summary></entry><entry><title type="html">Looking at the Git landscape through SHATTERED glass</title><link href="https://badhomb.re//git/sha1/rant/2017/03/04/shattered.html" rel="alternate" type="text/html" title="Looking at the Git landscape through SHATTERED glass" /><published>2017-03-04T20:00:00-05:00</published><updated>2017-03-04T20:00:00-05:00</updated><id>https://badhomb.re//git/sha1/rant/2017/03/04/shattered</id><content type="html" xml:base="https://badhomb.re//git/sha1/rant/2017/03/04/shattered.html">&lt;p&gt;A recent &lt;a href=&quot;https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html&quot;&gt;blogpost&lt;/a&gt; 
from Google and CWI showed us what many had suspected would happen soon: a
practical attack on SHA-1 could be successfully carried out.  Although this is
an important milestone for the history of cryptographic hash algorithms (if
that’s even a thing), the practical implications are more nuanced. As it is
with the emerging trend of branded vulnerabilities — (this one is called
&lt;a href=&quot;https://shattered.io/&quot;&gt;shattered&lt;/a&gt;) — the details are lost in a sea of PR-littered vacuity and witty
names for vulnerabilities.&lt;/p&gt;

&lt;p&gt;Among the long list of “broken” applications, there is Git, probably the most
widely used version control system. Given this popularity, it is not surprising
that many people have been running around flailing their arms, writing &lt;a href=&quot;https://arstechnica.com/security/2017/02/watershed-sha1-collision-just-broke-the-webkit-repository-others-may-follow/&quot;&gt;risible
headlines&lt;/a&gt;
and the tweeting the
&lt;a href=&quot;https://twitter.com/bcrypt/status/834762918692483073&quot;&gt;not&lt;/a&gt;–&lt;a href=&quot;https://twitter.com/bascule/status/836298123408388096&quot;&gt;so&lt;/a&gt;–&lt;a href=&quot;https://twitter.com/andywingo/status/835132154749272064&quot;&gt;amusing&lt;/a&gt;–&lt;a href=&quot;https://twitter.com/realhashbreaker/status/83519994580423884://twitter.com/realhashbreaker/status/835199945804238848&quot;&gt;anymore&lt;/a&gt;
“securely holier than thou”
&lt;a href=&quot;https://twitter.com/NathOnSecurity/status/834796736308793344&quot;&gt;tweets&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;At first sight, the concern is well founded. A collision in SHA-1 allows
attackers to replace files in such a way that git — not even with signing
enabled — cannot identify it. Of course, this would let resourceful attackers
sneak in backdoored versions of files after a repository compromise, or create
a signed commit object that wasn’t made by the author of the signature.
However, after further inspection, the picture is not that grim.&lt;/p&gt;

&lt;p&gt;The truth is, as I will show below, SHA-1 is broken in such a way that
performing any of these attacks is unfeasible; at least from the economic
standpoint. Throughout this post, I’ll show how a doomsday scenario would be
for git if someone really, really wanted to attack by colliding hashes.&lt;/p&gt;

&lt;h2 id=&quot;background-or-wait-wasnt-this-already-very-really-broken&quot;&gt;Background (or “wait, wasn’t this already very, really broken?!”)&lt;/h2&gt;

&lt;p&gt;Before digging into the details of git and its use of SHA-1, I wanted to make a
brief rundown of what actually happened this last Thursday, and try to place it
in the context of other hash functions.&lt;/p&gt;

&lt;p&gt;The truth is, SHA-1 was already broken, as it was announced by &lt;a href=&quot;https://mail.python.org/pipermail/python-dev/2005-December/058850.html&quot;&gt;rivest&lt;/a&gt; back in 2005. 
The reason for this conclusion is that, although the name is not similar,
the construction is practically the same as MD5. This construction,
Merkle-Damgard, is the one used by both hash algorithms and the reason an
“arbitrary prefix attack” is possible. It is of no surprise that some rockstars
in the industry have all came out, unsurprised themselves, to point out SHA-1
was already deprecated by NIST 6 years ago.&lt;/p&gt;

&lt;p&gt;This comparison with MD5 is a really good starting point to understand the
nature of these “news” and how they apply to the use of SHA-1. Here is the
timeline of the practical attacks against MD5:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;In 1996, collisions were found in the compression function of MD5. Since
then, experts recommended to stay away from MD5.&lt;/li&gt;
  &lt;li&gt;In 2005, researchers were able to create a Postscript file that collided.
When this happened, Rivest came out to &lt;a href=&quot;https://mail.python.org/pipermail/python-dev/2005-December/058850.html&quot;&gt;declare both MD5 and SHA-1 broken in
terms of collision resistance&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;In 2007, Marc Stevens (sounds familiar), wrote hashclash, which &lt;a href=&quot;https://natmchugh.blogspot.com/2014/10/how-i-created-two-images-with-same-md5.html&quot;&gt;Nat McHugh&lt;/a&gt;
used to collide the hash of two PNG files.&lt;/li&gt;
  &lt;li&gt;In 2008, &lt;a href=&quot;https://arstechnica.com/security/2008/12/theoretical-attacks-yield-practical-attacks-on-ssl-pki/&quot;&gt;CCC were able to impersonate a CA by colliding their MD5 hashes&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;In 2012, &lt;a href=&quot;https://arstechnica.com/security/2012/06/flame-wields-rare-collision-crypto-attack/&quot;&gt;The flame malware used an MD5 collision&lt;/a&gt; to fake a certificate owned by microsoft.&lt;/li&gt;
  &lt;li&gt;In 2017, &lt;a href=&quot;https://blogs.oracle.com/stevenChan/entry/jar_md5&quot;&gt;people still use MD5, god knows why&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As I hinted before, the timeline for SHA-1 is (and will be) incredibly similar
to the one for MD5, with the current events matching the ones for the 
year 2005. A collision in a PS/PDF file is pretty much the same, as these file
formats are not brittle as others, and they allow random data to be located
somewhere in the file without showing it or damaging the file format in any
way. Other file formats, such as X.509 certificates, source code files and
lossy-compression images are not so resilient, which greatly downplays the
possibility of a collision. I’ll elaborate more on this fact later.&lt;/p&gt;

&lt;p&gt;Other similarities also arise; approximately 10 years after people declared MD5
broken, it was broken in practice. This time, it took us 12 years to go from
warning to be able to hash two files and get the exact same value. The final
similarity is rather obvious: thanks to Moore’s law and further research,
attacks are only going to become more effective. There is no reason to continue
using SHA-1 for newer applications.&lt;/p&gt;

&lt;p&gt;The lesson to take from this comparison is that this marks a milestone in the
usual life of a hash algorithm, it may put a nail in the coffin, but the
algorithm is not completely dead for applications that rely on it today (as it
is the case of git).&lt;/p&gt;

&lt;p&gt;Regardless of this fact, we wanted to do a doomsday scenario for git, so you
will have it. To do this, I have to give you a little bit of background on git.&lt;/p&gt;

&lt;h2 id=&quot;how-does-git-work&quot;&gt;How does git work?&lt;/h2&gt;

&lt;p&gt;The information we need from git to carry out our attack is minimal. I need to
describe the file formats, and then from that we will pick the best point to
wreak havoc in a git repository (not really, just on paper).&lt;/p&gt;

&lt;p&gt;A git repository is mostly made of two types of files: references and objects.
To keep things brief, I’ll skip the details of the former. It suffices to say
that git references, like in a programming language, are pointers to other
entities. In this case, they are pointers to git objects. An example of a
reference is branches, who point to git commit objects.&lt;/p&gt;

&lt;p&gt;Git objects hold information about a repository. As of today, there are four of
them:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Git commit objects: these hold information about a revision in the
repository. Inside a commit object you will find the author, the commit
message, a date, and other stuff. However, the most important bit of
information is the id of a root tree object.&lt;/li&gt;
  &lt;li&gt;A tree object is akin to a folder in a filesystem, and it contains a listing
of other tree objects and blob objects (among with information about them, as
it shown below).&lt;/li&gt;
  &lt;li&gt;A blob object which, as you may have guessed, is a file. This contains the
size of the file and the contents of the file itself.&lt;/li&gt;
  &lt;li&gt;Finally, there are tag objects, which are pretty much like commit objects,
but they are meant to point to a static position in the repository(e.g.,
release v1.0). Git tags are usually signed using GPG to ensure the
authenticity of the tag and all the files to which it is indirectly pointing
to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is in the ID of all of these objects, where the SHA-1 function is used. To
obtain the id of an object, you simply hash the contents and header of the
object to obtain the id. By creating a new object that shares an SHA-1 with a
known object, we can perform our attack.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;We could create a new commit object with the same id and replace it. This
would let us replace the whole root tree and pretty much serve another
repository.&lt;/li&gt;
  &lt;li&gt;We would also create a new root tree object with the same id and replace
it. This would let us replace all or any of the files in the repository.&lt;/li&gt;
  &lt;li&gt;We could also create a file that hashes to the same value (with a minor
caveat that I’ll cover later) and replace the object in the repository.&lt;/li&gt;
  &lt;li&gt;We could do the same thing we do for a commit as we do for a git tag.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From the attacks that I outlined above, only the third is really feasible. The
reason as to why this is lies in the fact that the git commit/tree object
format is rather brittle. Unlike PDF’s and PS files, you cannot put many bytes
of random junk somewhere and expect git to not complain about a corrupt object.&lt;/p&gt;

&lt;p&gt;This pretty much leaves us with the ability to replace blobs, which should be
enough to do something evil. Let me elaborate on this a little bit more.&lt;/p&gt;

&lt;h2 id=&quot;random-junk-and-the-probability-of-getting-the-right-junk&quot;&gt;Random Junk, and the probability of getting the right Junk&lt;/h2&gt;

&lt;p&gt;When carrying the attacks described in the paper the researchers exploited the
fact that the Merkle-Damgard construction is likely to produce the same hash if
these two files share a common prefix. For this, you need to set a common
prefix between both files.&lt;/p&gt;

&lt;p&gt;Immediately after, it comes two blocks of random bytes. The first block is is
used to make the hash function reach a certain “state” called “near-collision.”
The second block of random bytes is used to cause a complete collision that
also allows for any kind of suffix. These blocks apparently need to be found
only once, so you can create your own colliding pdf’s using these blocks.&lt;/p&gt;

&lt;p&gt;However, these collisions require the same prefix-length, a chosen prefix and a
place to locate 84 bytes of junk after the prefix in such a way that it doesn’t
result in a corrupted file. It is the placement of this junk what gets in the
way with brittle file formats. For example, for git trees, any byte that falls
out of the format for a file list would be corrupted:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ cat .git/objects/da/3d3fc569dc3ded6c67e5209840ff4205202613 | zlib-flate -uncompress | hexdump -C
00000000 74 72 65 65 20 31 33 37 00 31 30 30 37 35 35 20 |tree 137.100755 |
00000010 74 68 65 5f 73 6f 6f 74 68 69 6e 67 5f 73 6f 75 |thesoothingsou|
00000020 6e 64 5f 6f 66 5f 68 61 73 68 5f 63 6f 6c 6c 69 |ndofhashcolli|
00000030 73 69 6f 6e 73 2e 70 79 00 45 ca db ed b6 09 fd |sions.py.E......|
00000040 cc 29 c5 5b 79 54 83 ba 6a c6 b1 f9 b0 31 30 30 |.).[yT..j....100|
00000050 36 34 34 20 74 68 65 5f 73 6f 6f 74 68 69 6e 67 |644 thesoothing|
00000060 5f 73 6f 75 6e 64 5f 6f 66 5f 68 61 73 68 5f 63 |soundofhashc|
00000070 6f 6c 6c 69 73 69 6f 6e 73 2e 77 61 76 00 0e 0a |ollisions.wav...|
00000080 e5 a7 bf 61 78 d4 90 12 7b 2c 74 0e 78 34 b0 85 |...ax...{,t.x4..|
00000090 cf 59 |.Y|
00000092
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So that a tree object is correct, you would need:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The ascii word tree (this could easily be a prefix), followed by a space, the size and a null&lt;/li&gt;
  &lt;li&gt;Then a list of entries, consisting of:
    &lt;ul&gt;
      &lt;li&gt;6 decimal ascii digits, for permission and file-type followed by a space&lt;/li&gt;
      &lt;li&gt;the filename followed by a null&lt;/li&gt;
      &lt;li&gt;20 bytes that point to another existing git object (submodules are repesented by a commit)&lt;/li&gt;
      &lt;li&gt;and then, immediately, the next entry&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s it, that’s what appears in a tree object. If we were to place 84 bytes
of random junk somewhere there, we would have to be really lucky to find
colliding blocks that match (we can’t use the ones for pdf’s) the format for a
tree. The probability is pretty much 0.&lt;/p&gt;

&lt;p&gt;A more realistic approach would be to repeat this with a blob object. However,
this is still sensitive to the file format that we use. The header of a blob
object is only the size of the blob (in bytes) and a null, followed by the rest
of the content. Due to this small header, we can’t use the same 
&lt;a href=&quot;https://github.com/joeyh/supercollider&quot;&gt;colliding blocks for the pdfs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Luckily for us (remember we are attackers), it is feasible to account for this
header and find a new collision. Indeed, the easy way to do this would be to
just prepend the size of the file and a newline to the pdf prefix and recompute
the collision. Causing collisions for other file formats would be harder — the
probability to find a two blocks of 84 + 64 printable bytes (assuming an even
distribution) to create a “meaningful” source code file floats 
around 97/256^100 = 1.454705909478762e-239 if there were only 100 bytes in total.&lt;/p&gt;

&lt;p&gt;A colliding C file may look like this.&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/* prefix stuff */
include &amp;lt;...&amp;gt;
int main (...) {
  /* BLOCK1 of ascii-printable crap
   *
   */

  /* BLOCK2 of ascii-printable crap
   *
   */

   evil_code(); // not part of the collision.

}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Notice we had to be super lucky to not have any nulls inside the comment, or
control characters, or the sequence */ that would terminate the comment early.&lt;/p&gt;

&lt;p&gt;Now, let’s assume we are that lucky, and that we are still willing to collide a
file, any file. We would still need to consider other factors. The first one
that came to mind is blob-lifetime.&lt;/p&gt;

&lt;h2 id=&quot;blob-lifetime&quot;&gt;Blob lifetime&lt;/h2&gt;

&lt;p&gt;Blob lifetime is what I dubbed as the time between a blob’s inception to the
time it is replaced by another, newer blob. This is particularly relevant
because, if you are going to spend a year like these researchers did in finding
a collision, maybe you want to collide a blob that is not going to be unused by
the time you find it, which may possibly replace the prefix. To collide an
arbitrary blob, an attacker should:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Take the blob header. This is, size + newline as a prefix, plus other things
that are usually static (e.g, some of the imports, maybe the license on the
comments, etc.)&lt;/li&gt;
  &lt;li&gt;Compute the colliding blocks immediately after in such a way that it doesn’t
corrupt the file (here’s where they would have to get really lucky if this
was code).&lt;/li&gt;
  &lt;li&gt;Once finding a collision, pad with whatever backdoor code they’d like and
create the evil blob. Another non-evil blob should be sent to the repository
to create a valid entry, maybe through a non-malicious pull request,
although you would need a good explanation for your two block comments with
random ascii characters.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;On the graph below you can explore the lifetime of blob objects for some
popular git repositories (if you’d like me to compute a dataset with your repo
please ping me!). You can explore the files, where they appear and see the time
it took for them to be replaced. I did a linear equation to “estimate” the cost
of cracking a certain blob that (if we were able to know how long would a blob
survive before it was replaced) you can hover.&lt;/p&gt;

&lt;iframe src=&quot;https://santiagotorres.github.io/blob_lifecycles/?dataset=git.json&quot; width=&quot;100%&quot; height=&quot;720&quot;&gt;&lt;/iframe&gt;

&lt;p&gt;You can see a bigger version of this graph &lt;a href=&quot;https://santiagotorres.github.io/blob_lifecycles/?dataset=git.json&quot;&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This graph is pretty interesting due to many factors. It sheds light on the low
hanging fruit that an attacker would maybe exploit. The files that are easy to
crack are for example:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;LICENSE files: boring&lt;/li&gt;
  &lt;li&gt;Test code: also boring&lt;/li&gt;
  &lt;li&gt;Documentaton: Also boring&lt;/li&gt;
  &lt;li&gt;Vendored code (check the rightmost delta on the docker dataset): somewhat interesting&lt;/li&gt;
  &lt;li&gt;Images: I guess we could prank someone by replacing their logo with their
childhood pictures by spending 110 GPU years to cause a collission or
something.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The second exercise that I’d like you to do is the following: pick any file
that you would like to crack on the rightmost side, and then write the name of
that file in the filter. You will most likely see that the file appears also on
the left. The reason as to why this happens is that, as projects grow, the code
churn increases with it. The newer versions of the files (and the ones that are
used on the latest revision) usually fall on the left margin.&lt;/p&gt;

&lt;p&gt;Result: you may be able to change the LICENSE file of a single-developer
project by spending 31k and two and a half years.&lt;/p&gt;

&lt;h2 id=&quot;sneaking-the-blob&quot;&gt;Sneaking the blob&lt;/h2&gt;

&lt;p&gt;Now, let’s say that we got lucky with the blob, we spent our tuition money and
two years of our life on because we really wanted to screw that guy’s
.gitignore file. We did it, we have a blob that hashes to the same thing, now
what? The story is not really over.&lt;/p&gt;

&lt;p&gt;The way the git transport protocol works doesn’t let us replace the remote
blob. I won’t go into the details, but a cartoonish depiction of what a push
would look like is as follows:&lt;/p&gt;

&lt;div class=&quot;highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;gt; CLIENT: I'm pushing this commit, with this tree, and here's this new blob with commit id 0xCAFEC0FFEE
&amp;gt; SERVER: Oh, I already got 0xCAFEC0FFEE, but thanks
&amp;gt; CLIENT: =(
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;What alternatives do we have? keeping the blob locally is useless, because we
want people to use our blob. To do so, I’m imagining the following
alternatives:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;We could destroy all history on the git repository that contains all mentions
of the blob, and wait for it to get garbage-collected, and then push again —
noisy&lt;/li&gt;
  &lt;li&gt;We could, instead, trick people to get their stuff from our repository — we
need to be known and this would be pretty obvious&lt;/li&gt;
  &lt;li&gt;We could man in the middle the connection — Oh, why don’t we break GitHub’s
certificate fingerprints? Sadly, they don’t use SHA-1 anymore so we can’t do
that…&lt;/li&gt;
  &lt;li&gt;We could break into the repository, and change the file manually.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The story would be maybe different if the repository owner was the one acting
maliciously but, if he’s not a third party, he could just replace the history.
This may be the most dangerous case, for he could replace blobs for as long as
the prefix doesn’t change.&lt;/p&gt;

&lt;p&gt;To be fair, of all of the highly unlikely things that I’ve listed here, at
least the third and fourth seem plausible. Let’s assume you did that. At this
point, you successfully hacked a git repository by causing a SHA-1 collision.&lt;/p&gt;

&lt;p&gt;Note, I presented this as an thought exercise, for the git folks are already
working on integrating the hardened SHA-1 code and your attack would
unsuccessful once it is merged. Shucks!&lt;/p&gt;

&lt;p&gt;To summarize:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;This attack is an important milestone in the evolution of SHA-1’s
deprecation.&lt;/li&gt;
  &lt;li&gt;This attack is not feasible to do against git. An arbitrary prefix attack
would be more interesting, but we aren’t there yet.&lt;/li&gt;
  &lt;li&gt;Even if it was, collisions of certain files are harder than others, given the
code churn and the entropy of the repository’s files.&lt;/li&gt;
  &lt;li&gt;You would have to be really lucky to find two colliding blocks for files that
have brittle file formats (such as code or certain git objects).&lt;/li&gt;
  &lt;li&gt;Say you were lucky, there are better ways to mess with people’s LICENSE files&lt;/li&gt;
  &lt;li&gt;To do some damage, you would target vendored code, but vendored code is
already a mess that should be handled better. There are other ways to abuse
this fact, just look at libtiff, zziplib and libwmf.&lt;/li&gt;
  &lt;li&gt;The story is not over one you compute the collision. You have to put the blob
in the right place. This often means either stealing a certificate or hacking
into a server. The story is of course different if you are a malicious
server, but things can go wrong in many other ways if that happens.&lt;/li&gt;
  &lt;li&gt;There are already works in the making to both harden git’s use of sha1 and
replace the hashing algorithm, so don’t get your hopes up.&lt;/li&gt;
&lt;/ul&gt;</content><author><name></name></author><summary type="html">A recent blogpost from Google and CWI showed us what many had suspected would happen soon: a practical attack on SHA-1 could be successfully carried out. Although this is an important milestone for the history of cryptographic hash algorithms (if that’s even a thing), the practical implications are more nuanced. As it is with the emerging trend of branded vulnerabilities — (this one is called shattered) — the details are lost in a sea of PR-littered vacuity and witty names for vulnerabilities.</summary></entry></feed>