<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.1">Jekyll</generator><link href="https://lwthiker.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://lwthiker.com/" rel="alternate" type="text/html" /><updated>2022-06-19T18:07:06+00:00</updated><id>https://lwthiker.com/feed.xml</id><title type="html">lwt hiker</title><subtitle>Random thoughts about software, hacking and other things.</subtitle><entry><title type="html">HTTP/2 fingerprinting: A relatively-unknown method for web fingerprinting</title><link href="https://lwthiker.com/networks/2022/06/17/http2-fingerprinting.html" rel="alternate" type="text/html" title="HTTP/2 fingerprinting: A relatively-unknown method for web fingerprinting" /><published>2022-06-17T12:30:00+00:00</published><updated>2022-06-17T12:30:00+00:00</updated><id>https://lwthiker.com/networks/2022/06/17/http2-fingerprinting</id><content type="html" xml:base="https://lwthiker.com/networks/2022/06/17/http2-fingerprinting.html"><![CDATA[<p>HTTP/2 fingerprinting is a method by which web servers can identify which client is sending the request to them<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. It can identify the browser type and version, for instance, or whether a script is used. The method relies on the internals of the HTTP/2 protocol which are less widely known that those of its simpler predecessor HTTP/1.1. In this post I will first give a short description of the HTTP/2 protocol, then provide details on how a web server can use the protocol’s various parameters to identify the client. Finally, I will list methods of checking and controlling a client’s HTTP/2 signature.</p>

<p>This is the second part of a two-part series about web fingerprinting. Read the previous post about TLS fingerprinting <a href="/networks/2022/06/17/tls-fingerprinting.html">here</a>.</p>

<h3 class="no_toc" id="table-of-contents">Table of contents</h3>

<ul id="markdown-toc">
  <li><a href="#back-to-http11" id="markdown-toc-back-to-http11">Back to HTTP/1.1</a></li>
  <li><a href="#a-short-introduction-to-http2" id="markdown-toc-a-short-introduction-to-http2">A short introduction to HTTP/2</a>    <ul>
      <li><a href="#frames-and-streams" id="markdown-toc-frames-and-streams">Frames and streams</a></li>
    </ul>
  </li>
  <li><a href="#client-fingerprinting-with-http2" id="markdown-toc-client-fingerprinting-with-http2">Client fingerprinting with HTTP/2</a>    <ul>
      <li><a href="#the-settings-frame" id="markdown-toc-the-settings-frame">The <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame</a></li>
      <li><a href="#the-window_update-frame" id="markdown-toc-the-window_update-frame">The <code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> frame</a></li>
      <li><a href="#the-headers-frame" id="markdown-toc-the-headers-frame">The <code class="language-plaintext highlighter-rouge">HEADERS</code> frame</a></li>
      <li><a href="#the-priority-frame" id="markdown-toc-the-priority-frame">The <code class="language-plaintext highlighter-rouge">PRIORITY</code> frame</a></li>
    </ul>
  </li>
  <li><a href="#where-is-http2-fingerprinting-being-used" id="markdown-toc-where-is-http2-fingerprinting-being-used">Where is HTTP/2 fingerprinting being used?</a></li>
  <li><a href="#controlling-your-http2-signature" id="markdown-toc-controlling-your-http2-signature">Controlling your HTTP/2 signature</a></li>
  <li><a href="#checking-a-clients-http2-signature" id="markdown-toc-checking-a-clients-http2-signature">Checking a client’s HTTP/2 signature</a>    <ul>
      <li><a href="#the-ts1-method-and-library" id="markdown-toc-the-ts1-method-and-library">The TS1 method and library</a></li>
    </ul>
  </li>
  <li><a href="#concluding" id="markdown-toc-concluding">Concluding</a></li>
</ul>

<h2 id="back-to-http11">Back to HTTP/1.1</h2>

<p>With HTTP/1.1 - the older, more familiar protocol - a client sends a textual request to the server (usually encrypted with TLS). Here’s how Chrome’s request looks like by default:</p>

<div class="language-http highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">GET</span> <span class="nn">/</span> <span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span>
<span class="na">Host</span><span class="p">:</span> <span class="s">www.wikipedia.org</span>

sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">User-Agent</code> header contains the client’s exact version and thus can be used to identify the client. However, this is easy to fake with any http library or command line tool and is no longer considered a reliable method of fingerprinting by any means. A little less known fact is that the <code class="language-plaintext highlighter-rouge">Accept</code> header also takes different values depending on the client. This is also easy to fake however.</p>

<h2 id="a-short-introduction-to-http2">A short introduction to HTTP/2</h2>

<p>HTTP/2 is a major revision of the HTTP protocol and has been around since around 2015. About half of all websites now use HTTP/2<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and basically all the popular sites use it by default. A great in-depth overview of the HTTP/2 protocol can be found <a href="https://web.dev/performance-http2/">in this article</a>. I will detail the parts most important to this article.</p>

<p>You can check if a website is running HTTP/2 with the Chrome/Firefox developer tools. For example, in Firefox it would look like the following:</p>

<p><img src="/assets/img/http2_lwthiker_com.png" alt="HTTP2 in Firefox dev tools" /></p>

<p>The primary goal of HTTP/2 is to improve the performance of websites and web applications. It achieves that goal by implementing a few core features:</p>
<ul>
  <li>Multiplexing - Multiple requests and responses can share the same TCP connection simultaneously, thus reducing the time to fetch sites with a large number of resources (images, scripts, etc.).</li>
  <li>Prioritization - HTTP/2 supports prioritizing certain requests and responses.</li>
  <li>Server push - In HTTP/2, the server can send resources to the client before the client requests them.</li>
</ul>

<p>The application semantics of the HTTP protocol are not changed however: It is still composed of the familiar request/response model with URIs, HTTP methods, HTTP headers and status codes.</p>

<h3 id="frames-and-streams">Frames and streams</h3>

<p>HTTP/2 is a binary protocol, as opposed to the textual HTTP/1.1. The messages in HTTP/2 are composed of <em>frames</em>, with ten types of frames serving different purposes. Frames are always part of a <em>stream</em>. A single stream is usually used to fetch a single resource from the server (html, script, image, etc.). Frames from multiple streams can be sent and received simultaneously, and thus multiplexing is achieved. A typical HTTP/2 connection would usually look like the following:</p>

<p><img src="/assets/img/sample_http2_connection.png" alt="Sample HTTP/2 connection" /></p>

<p>In this illustration the following frames are exchanged:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">SETTINGS</code> - This frame is the first frame sent by the client and contains HTTP/2-specific settings. It is part of stream 0, which is the default root stream. No resource is retrieved on stream 0.</li>
  <li><code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> - Increases the window size of the receiver. More on this later.</li>
  <li><code class="language-plaintext highlighter-rouge">HEADERS</code> - Contains the actual request from the client to the server. It contains the URI, the HTTP method and the client’s HTTP headers.</li>
  <li><code class="language-plaintext highlighter-rouge">DATA</code> - Contains the response from the server with the requested resource’s data.</li>
</ul>

<h2 id="client-fingerprinting-with-http2">Client fingerprinting with HTTP/2</h2>

<p>Let’s take a deeper look at some of the frames. Each of the frames contains information that allows clients to be easily fingerprinted by the server.</p>

<h3 id="the-settings-frame">The <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame</h3>

<p>With the <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame, the client informs the server about its HTTP/2 preferenecs.
There are six different settings<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> with which the client can control parameters such as the maximum number of concurrent streams, the maximum number of HTTP headers, the default window size and whether it supports the server push feature.
Each HTTP/2 client uses a different set of settings. The same client will usually use the same set of settings regardless of what the actual HTTP request is.</p>

<p>To see what SETTINGS are sent by a client, I usually use <a href="https://nghttp2.org/documentation/nghttpd.1.html">nghttpd</a>, a small HTTP/2 server that can log these parameters.
Here are Chrome’s settings taken from the log:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>recv SETTINGS frame &lt;length=24, flags=0x00, stream_id=0&gt;
    [SETTINGS_HEADER_TABLE_SIZE(0x01):65536]
    [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):1000]
    [SETTINGS_INITIAL_WINDOW_SIZE(0x04):6291456]
    [SETTINGS_MAX_HEADER_LIST_SIZE(0x06):262144]
</code></pre></div></div>

<p>Seen here are 4 different settings set by Chrome to some fixed values. Here are Firefox’s settings in comparison:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>recv SETTINGS frame &lt;length=18, flags=0x00, stream_id=0&gt;
    [SETTINGS_HEADER_TABLE_SIZE(0x01):65536]
    [SETTINGS_INITIAL_WINDOW_SIZE(0x04):131072]
    [SETTINGS_MAX_FRAME_SIZE(0x05):16384]
</code></pre></div></div>

<p>Both the kind of settings and their values are different, making the browsers easily distinguishable.
As another example, <code class="language-plaintext highlighter-rouge">curl</code> sets the <code class="language-plaintext highlighter-rouge">SETTINGS_ENABLE_PUSH</code> setting to 0 to disable the server push feature, which makes it distinguishable from a browser.
Because the settings aren’t easily controllable by the user, they become a reliable method for client fingerprinting.</p>

<h3 id="the-window_update-frame">The <code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> frame</h3>

<p>HTTP/2 implements a mechanism for flow-control.
Flow-control gives the receiving side means to regulate the flow of traffic on a per-stream basis.
This is implemented using a window size, which is a number specifying how many bytes the receiver can process.
There is a window size for each stream and a window size for the connection as a whole.
This mechanism is pretty similar to TCP flow-control, but since multiple streams are multiplexed on top of a single TCP connection, HTTP/2 implements its own stream-level flow-control.
For a full explanation you may refer to the <a href="https://httpwg.org/specs/rfc7540.html#FlowControl">RFC</a> or to <a href="https://web.dev/performance-http2/#flow-control">this article</a>.</p>

<p>The stream-level default window size is controlled by the <code class="language-plaintext highlighter-rouge">SETTINGS_INITIAL_WINDOW_SIZE</code> in the <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame, visible in the settings tables above.
You can observe above that Chrome uses 6MB (6291456) and Firefox uses 128KB (131072).</p>

<p>As the client receives data, it can adjust the window size using a <code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> frame, which increases its window size.</p>

<p>The connection-level window size is 65535 bytes by default and can only be increased by sending a <code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> frame on the special stream id 0.
Most clients will send a <code class="language-plaintext highlighter-rouge">WINDOW_UPDATE</code> frame for stream 0 right at the beginning of the connection, immediately after sending the <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame. This is how it looks like for Chrome:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>recv WINDOW_UPDATE frame &lt;length=4, flags=0x00, stream_id=0&gt;
          (window_size_increment=15663105)
</code></pre></div></div>

<p>Chrome is in effect increasing the connection-level window size to 15MB (15663105+65535=15MB).
Firefox, on the other hand, will increase it to 12MB. <code class="language-plaintext highlighter-rouge">curl</code> uses 32MB<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Hence this parameter can be used for fingerprinting as well.</p>

<h3 id="the-headers-frame">The <code class="language-plaintext highlighter-rouge">HEADERS</code> frame</h3>

<p>The <code class="language-plaintext highlighter-rouge">HEADERS</code> frame contains, broadly speaking, all the functionality of HTTP/1.1 in a single frame.
It contains the server’s host, the resource URI, the method (GET/POST/etc.) and the client’s headers.
An important difference, however, is that everything is now considered a “header”.
Here’s how it looks like for Chrome:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>recv (stream_id=3) :method: GET
recv (stream_id=3) :authority: localhost:8443
recv (stream_id=3) :scheme: https
recv (stream_id=3) :path: /favicon.ico
recv (stream_id=3) sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"
recv (stream_id=3) sec-ch-ua-mobile: ?0
recv (stream_id=3) user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36
recv (stream_id=3) sec-ch-ua-platform: "Linux"
recv (stream_id=3) accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
recv (stream_id=3) sec-fetch-site: same-origin
recv (stream_id=3) sec-fetch-mode: no-cors
recv (stream_id=3) sec-fetch-dest: image
recv (stream_id=3) accept-encoding: gzip, deflate, br
recv (stream_id=3) accept-language: en-GB,en;q=0.9
recv HEADERS frame &lt;length=121, flags=0x25, stream_id=3&gt;
</code></pre></div></div>

<p>The method is encoded in the special <code class="language-plaintext highlighter-rouge">:method</code> header, the host in <code class="language-plaintext highlighter-rouge">:authority</code>, the scheme in <code class="language-plaintext highlighter-rouge">:scheme</code> and the URI in <code class="language-plaintext highlighter-rouge">:path</code>.
The interesting thing here is that the order of these pseudo-headers is fixed but different for each client.
From the protocol’s standpoint all orders are valid, but each client had decided to order them differently.
The header order for some common clients (using the first letter of each pseudo-header to denote it):</p>

<table>
  <thead>
    <tr>
      <th>Browser</th>
      <th>Order</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Chrome</td>
      <td><code class="language-plaintext highlighter-rouge">masp</code></td>
    </tr>
    <tr>
      <td>Firefox</td>
      <td><code class="language-plaintext highlighter-rouge">mpas</code></td>
    </tr>
    <tr>
      <td>Safari</td>
      <td><code class="language-plaintext highlighter-rouge">mspa</code></td>
    </tr>
    <tr>
      <td>curl</td>
      <td><code class="language-plaintext highlighter-rouge">mpsa</code></td>
    </tr>
  </tbody>
</table>

<p>This seemingly small difference is again making it easy to fingerprint the clients.</p>

<h3 id="the-priority-frame">The <code class="language-plaintext highlighter-rouge">PRIORITY</code> frame</h3>
<p>In HTTP/2 the client can define stream priorities. For example, the client may want to prioritize receiving JS scripts over images.
This article being long enough, I will not describe this mechanism in full details. However, it is important to know two things:</p>
<ul>
  <li>The client can define a tree of streams, by specifying for each stream a parent stream. This tree defines dependencies for prioritization purposes.</li>
  <li>The client can define for each stream a weight, which sets its priority relative to its siblings in the tree.</li>
</ul>

<p>Both the parent of each stream and its weight are communicated via the <code class="language-plaintext highlighter-rouge">PRIORITY</code> frame.
Firefox, for example, builds a rather complex tree of streams that looks like the following:</p>

<p><img src="/assets/img/firefox_stream_tree.png" alt="Stream priorities in Firefox" style="display: block; margin-left: auto; margin-right: auto" /></p>

<p>To create this tree Firefox by default will send a <code class="language-plaintext highlighter-rouge">PRIORITY</code> frame for streams 3,5,7,9,11,13 defining their parents and weights.
Inspecting the nghttpd logs we observe this as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=3&gt;
	(dep_stream_id=0, weight=201, exclusive=0)
recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=5&gt;
	(dep_stream_id=0, weight=101, exclusive=0)
recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=7&gt;
	(dep_stream_id=0, weight=1, exclusive=0)
recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=9&gt;
	(dep_stream_id=7, weight=1, exclusive=0)
recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=11&gt;
	(dep_stream_id=3, weight=1, exclusive=0)
recv PRIORITY frame &lt;length=5, flags=0x00, stream_id=13&gt;
	(dep_stream_id=0, weight=241, exclusive=0)
</code></pre></div></div>

<p>The use of this specific tree structure and these specific weights is thus very indicative of Firefox.</p>

<h2 id="where-is-http2-fingerprinting-being-used">Where is HTTP/2 fingerprinting being used?</h2>

<p>HTTP/2 fingerprinting lets the server identify the client reliably before responding with data.
Therefore it is used for similar purposes as <a href="/networks/2022/06/17/tls-fingerprinting.html">TLS fingerprinting</a>: Usually by commercial anti-DDOS and anti-bot solutions attempting to block automatic tools while allowing real browsers.</p>

<p>I’ve personally witnessed this method being used in the wild, such that real browsers were handled the real site’s content, but <a href="https://github.com/lwthiker/curl-impersonate">curl-impersonate</a>, for example, got blocked. This was before HTTP/2 impersonation was fully implemented in curl-impersonate.</p>

<h2 id="controlling-your-http2-signature">Controlling your HTTP/2 signature</h2>
<p>As seen above, the HTTP/2 protocol contains a lot of details, and the parameters involved are not always configurable by the user.
Tools and libraries will usually try to abstract the HTTP/2 details away, and as a result each of these tools created its own unique HTTP/2 signature which cannot be easily altered.</p>

<p>To control your HTTP/2 signatures there are three methods that I’m aware of:</p>
<ul>
  <li>Use a headless browser through a framework such as <a href="https://github.com/puppeteer/puppeteer">Puppeteer</a> or <a href="https://github.com/microsoft/playwright">Playwright</a>. By using a real browser, you get that browser’s HTTP/2 signature.</li>
  <li><a href="https://github.com/lwthiker/curl-impersonate">curl-impersonate</a>, my own fork of the popular curl tool, that supports impersonating real browsers. In its latest version it has a much better HTTP/2 impersonation support. It can impersonate the HTTP/2 signatures of Firefox and Chrome pretty well, including all the parameters mentioned in this article. Its main advantage is that it combines the correct TLS signature as well.</li>
  <li>Write your own HTTP/2 client code through a low-level library such as <a href="https://github.com/nghttp2/nghttp2">nghttp2</a>, which gives you full control over all parameters.</li>
</ul>

<h2 id="checking-a-clients-http2-signature">Checking a client’s HTTP/2 signature</h2>

<p>You may wonder how to check a clien’ts HTTP/2 signature.
Unlike TLS fingerprinting which relies on an unencrypted TLS Client Hello packet, the HTTP/2 frames will almost always be encrypted.
This makes it a bit harder to inspect. There are two options which I like to use.</p>

<ul>
  <li>
    <p>Capture the encrypted session in Wireshark while defining the <code class="language-plaintext highlighter-rouge">SSLKEYLOGFILE</code> environment variable. Most clients will then write a keylog file which Wireshark can use to decrypt the session. Full instructions are available <a href="https://everything.curl.dev/usingcurl/tls/sslkeylogfile">here</a>. The decrypted frames will look like the following (note the presence of the frames discussed above):
<img src="/assets/img/http2_decrypted.png" alt="Decrypted HTTP2 frames" style="display: block; margin-left: auto; margin-right: auto" /></p>
  </li>
  <li>
    <p>Use <a href="https://nghttp2.org/documentation/nghttpd.1.html"><code class="language-plaintext highlighter-rouge">nghttpd</code></a>, a small HTTP/2 server. It is already packaged for most Linux distributions and macOS. To use it, first <a href="https://devcenter.heroku.com/articles/ssl-certificate-self">create a self-signed SSL key and certificate</a>, then run it as follows:</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nghttpd -v 8443 server.key server.crt
</code></pre></div>    </div>
    <p>Connect a client to <code class="language-plaintext highlighter-rouge">https://localhost:8443</code> and <code class="language-plaintext highlighter-rouge">nghttpd</code> will log all the frames it receives with all the parameters.</p>
  </li>
</ul>

<h3 id="the-ts1-method-and-library">The TS1 method and library</h3>

<p>TS1 is a method and a Python package I developed for the purpose of checking and comparing clients’ signatures. It is available at <a href="https://github.com/lwthiker/ts1">https://github.com/lwthiker/ts1</a> or on <a href="https://pypi.org/project/ts1-signatures/">PyPI</a>.</p>

<p>TS1 takes all the HTTP/2 frames the client sends until, and including, the HEADERS frame, and encodes them into a JSON format that looks like the following (shown is a truncated version):</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
    </span><span class="nl">"frames"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
        </span><span class="p">{</span><span class="w">
            </span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"SETTINGS"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
            </span><span class="nl">"settings"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
                </span><span class="p">{</span><span class="w">
                    </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
                    </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">65536</span><span class="w">
                </span><span class="p">},</span><span class="w">
                </span><span class="p">{</span><span class="w">
                    </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w">
                    </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">131072</span><span class="w">
                </span><span class="p">},</span><span class="w">
                </span><span class="p">{</span><span class="w">
                    </span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w">
                    </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">16384</span><span class="w">
                </span><span class="p">}</span><span class="w">
            </span><span class="p">]</span><span class="w">
        </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w">
            </span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WINDOW_UPDATE"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
            </span><span class="nl">"window_size_increment"</span><span class="p">:</span><span class="w"> </span><span class="mi">12517377</span><span class="w">
        </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w">
            </span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PRIORITY"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w">
            </span><span class="nl">"priority"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
                </span><span class="nl">"dep_stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w">
                </span><span class="nl">"weight"</span><span class="p">:</span><span class="w"> </span><span class="mi">201</span><span class="p">,</span><span class="w">
                </span><span class="nl">"exclusive"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="w">
            </span><span class="p">}</span><span class="w">
        </span><span class="p">},</span><span class="w">
        </span><span class="p">{</span><span class="w">
            </span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HEADERS"</span><span class="p">,</span><span class="w">
            </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">15</span><span class="p">,</span><span class="w">
            </span><span class="nl">"pseudo_headers"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="w">
                </span><span class="s2">":method"</span><span class="p">,</span><span class="w">
                </span><span class="s2">":path"</span><span class="p">,</span><span class="w">
                </span><span class="s2">":authority"</span><span class="p">,</span><span class="w">
                </span><span class="s2">":scheme"</span><span class="w">
            </span><span class="p">]</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">]</span><span class="w">
</span><span class="p">}</span><span class="w">

</span></code></pre></div></div>

<p>The JSON is then turned into a <em>canonical form</em>, a compactified form according to certain rules:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"frames"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"SETTINGS"</span><span class="p">,</span><span class="w"> </span><span class="nl">"settings"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">65536</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">131072</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"id"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="mi">16384</span><span class="p">}],</span><span class="w"> </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"WINDOW_UPDATE"</span><span class="p">,</span><span class="w"> </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nl">"window_size_increment"</span><span class="p">:</span><span class="w"> </span><span class="mi">12517377</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"PRIORITY"</span><span class="p">,</span><span class="w"> </span><span class="nl">"priority"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"dep_stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nl">"exclusive"</span><span class="p">:</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="nl">"weight"</span><span class="p">:</span><span class="w"> </span><span class="mi">201</span><span class="p">},</span><span class="w"> </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">3</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"frame_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"HEADERS"</span><span class="p">,</span><span class="w"> </span><span class="nl">"pseudo_headers"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">":method"</span><span class="p">,</span><span class="w"> </span><span class="s2">":path"</span><span class="p">,</span><span class="w"> </span><span class="s2">":authority"</span><span class="p">,</span><span class="w"> </span><span class="s2">":scheme"</span><span class="p">],</span><span class="w"> </span><span class="nl">"stream_id"</span><span class="p">:</span><span class="w"> </span><span class="mi">15</span><span class="p">}]}</span><span class="w">
</span></code></pre></div></div>

<p>then a SHA1 hash of the string is calculated to produce the TS1 signature hash:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>c9bb208868a10863867841a2e5bcb3b903719784
</code></pre></div></div>

<p>Different clients will have different hashes, and the hashes can be easily saved in a database for easy comparison of clients’ signatures.</p>

<p>More details about using the TS1 library can be found in the GitHub page.</p>

<h2 id="concluding">Concluding</h2>

<p>I will conclude with the same words from the previous post: Fingerprinting has become extremely common throughout the web, and while it is used for legitimate purposes such as blocking DDOS attacks, it is also making the web less open, less private and much more restrictive towards specific web clients. I <a href="/opensource/2022/05/21/firefox-flagged-suspicious.html">have witnessed before</a> how websites mark certain browsers as suspicious while letting in others (not intentionally probably), with TLS and HTTP fingerprinting being the main methods to achieve that.</p>

<p>With the added awareness about the prevelance of such techniques, I hope that browsers, web clients and future protocol designers will be more attentive towards these kinds of issues.</p>

<p><br /></p>

<hr />

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>This method, though relatively-unknown, is not new. After doing my own research about the subject for curl-impersonate, I found <a href="https://www.blackhat.com/docs/eu-17/materials/eu-17-Shuster-Passive-Fingerprinting-Of-HTTP2-Clients-wp.pdf">this BlackHat presentation</a> detailing a research with similar conclusions. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p><a href="https://w3techs.com/technologies/details/ce-http2">https://w3techs.com/technologies/details/ce-http2</a> <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p><a href="https://httpwg.org/specs/rfc7540.html#SettingValues">https://httpwg.org/specs/rfc7540.html#SettingValues</a> <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:4" role="doc-endnote">
      <p><a href="https://github.com/curl/curl/blob/10cd69623a544c83bae6d90acdf141981ae53174/lib/http2.c#L62">Source code reference</a> for curl’s window size <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="networks" /><summary type="html"><![CDATA[HTTP/2 fingerprinting is a method by which web servers can identify which client is sending the request to them1. It can identify the browser type and version, for instance, or whether a script is used. The method relies on the internals of the HTTP/2 protocol which are less widely known that those of its simpler predecessor HTTP/1.1. In this post I will first give a short description of the HTTP/2 protocol, then provide details on how a web server can use the protocol’s various parameters to identify the client. Finally, I will list methods of checking and controlling a client’s HTTP/2 signature. This method, though relatively-unknown, is not new. After doing my own research about the subject for curl-impersonate, I found this BlackHat presentation detailing a research with similar conclusions. &#8617;]]></summary></entry><entry><title type="html">TLS fingerprinting: How it works, where it is used and how to control your signature</title><link href="https://lwthiker.com/networks/2022/06/17/tls-fingerprinting.html" rel="alternate" type="text/html" title="TLS fingerprinting: How it works, where it is used and how to control your signature" /><published>2022-06-17T12:00:00+00:00</published><updated>2022-06-17T12:00:00+00:00</updated><id>https://lwthiker.com/networks/2022/06/17/tls-fingerprinting</id><content type="html" xml:base="https://lwthiker.com/networks/2022/06/17/tls-fingerprinting.html"><![CDATA[<p>In this two-part series of posts I would like to expand about server-side browser fingerprinting. Server-side fingerprinting is a collection of techniques used by web servers to identify which web client is making a request based on network parameters sent by the client. By web client I mean the type of client, as in which browser or CLI tool, and not a specific user like what a cookie identifies.</p>

<p>A different technique from server-side fingerprinting is client-side fingerprinting, which is when Javascript is injected to test the client. This may be the subject of a future post, and I’ll focus on server-side fingerprinting for now.</p>

<p>TLS fingerprinting is a widely-deployed server-side technique. It allows web servers to identify the client to a high degree of accuracy based on the first packet of the connection alone. I will give examples below to demonstrate just how easy it is to tell the client from the its TLS parameters.</p>

<p>This is the first part of a two-part series about web fingerprinting. Read the second post about HTTP/2 fingerprinting <a href="/networks/2022/06/17/http2-fingerprinting.html">here</a>.</p>

<h3 class="no_toc" id="table-of-contents">Table of contents</h3>

<ul id="markdown-toc">
  <li><a href="#how-does-tls-fingerprinting-work" id="markdown-toc-how-does-tls-fingerprinting-work">How does TLS fingerprinting work</a></li>
  <li><a href="#methods-for-signature-calculation" id="markdown-toc-methods-for-signature-calculation">Methods for signature calculation</a>    <ul>
      <li><a href="#ja3" id="markdown-toc-ja3">JA3</a></li>
      <li><a href="#ts1" id="markdown-toc-ts1">TS1</a></li>
    </ul>
  </li>
  <li><a href="#where-is-tls-fingerprinting-being-used" id="markdown-toc-where-is-tls-fingerprinting-being-used">Where is TLS fingerprinting being used?</a></li>
  <li><a href="#controlling-your-tls-signature" id="markdown-toc-controlling-your-tls-signature">Controlling your TLS signature</a></li>
  <li><a href="#whats-next-for-tls-fingerprinting" id="markdown-toc-whats-next-for-tls-fingerprinting">What’s next for TLS fingerprinting?</a></li>
</ul>

<h2 id="how-does-tls-fingerprinting-work">How does TLS fingerprinting work</h2>

<p>TLS is the evolution of SSL, the protocol previously responsible for handling encrypted connections between web clients and servers. SSL is no longer in common use, but its name is still mistakenly used to refer to TLS as well.</p>

<p>Whenever a web client - a browser, script or a command line tool - accesses a TLS-encrypted site (<code class="language-plaintext highlighter-rouge">https://...</code>), it first performs a <em>TLS handshake</em> with the server. Here is a schematic diagram, courtesy of Wikipedia:</p>

<p><img src="/assets/img/Full_TLS_1.2_Handshake.png" alt="TLS handshake" style="display: block; margin-left: auto; margin-right: auto" /></p>

<p>The first message is the <em>TLS client hello</em>, sent by the client to server. In this message the client declares to the server what parts of the TLS protocol it supports. The following are examples of parameters sent by the client:</p>
<ul>
  <li>The versions of the TLS protocol the client supports (from TLS 1.0 up to TLS 1.3).</li>
  <li>The cryptographic algorithms the client supports for data encryption, known as cipher suites.</li>
  <li>The cryptographic algorithms the client supports for digital signatures.</li>
</ul>

<p>As it happens, each client uses a different TLS library: Firefox uses NSS, Chrome uses BoringSSL, Safari uses Secure Transport, and Python uses OpenSSL. The result is that the above parameters differ significantly between clients. Here is an example of the cipher suites list declared by Chrome in the TLS client hello, as captured by Wireshark:</p>

<p><img src="/assets/img/chrome_cipher_list.png" alt="Chrome cipher list" /></p>

<p>This list - its contents and the order of ciphers - is different depending on the TLS client in use.
In addition to that, TLS is such a complex protocol that it has many extensions, each with its own set of additional parameters <sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. To give some examples:</p>
<ul>
  <li>Some clients support compressing the exchanged certificates through a <a href="https://datatracker.ietf.org/doc/html/rfc8879">dedicated TLS extension</a>.</li>
  <li>Some clients support negotiating parameters for the underlying protocol (e.g. HTTP/2) through a dedicated TLS extension called <a href="https://datatracker.ietf.org/doc/html/draft-vvv-tls-alps">ALPS</a>.</li>
  <li>Some clients add a fake TLS extension called <a href="https://tools.ietf.org/id/draft-ietf-tls-grease-01.html">GREASE</a>.</li>
</ul>

<p>Here is how Chrome’s list of TLS extensions looks like in Wireshark:</p>

<p><img src="/assets/img/chrome_tls_extension_list.png" alt="Chrome TLS extension list" /></p>

<p>For each browser the above list of extensions is different, and the order of extensions may differ as well.</p>

<p>The following is a comparison table demonstrating notable differences in TLS signatures of common clients<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>:</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Chrome</th>
      <th>Safari</th>
      <th>Firefox</th>
      <th>Python</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>No. of cipher suites</td>
      <td>16</td>
      <td>27</td>
      <td>17</td>
      <td>43</td>
    </tr>
    <tr>
      <td>No. of signature algorithms</td>
      <td>8</td>
      <td>11</td>
      <td>11</td>
      <td>20</td>
    </tr>
    <tr>
      <td>ALPS extension</td>
      <td>Yes</td>
      <td>No</td>
      <td>No</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Certificate compression method</td>
      <td>Brotli</td>
      <td>Zlib</td>
      <td>None</td>
      <td>None</td>
    </tr>
    <tr>
      <td>GREASE extension</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
      <td>No</td>
    </tr>
  </tbody>
</table>

<p>With this in mind it is obvious that web clients can be easily distinguished based on their TLS signature. The remarkable thing is that this information is all available upon the very first packet of the session to the server. The server can thus infer which client is connected even before responding back with any kind of data. Moreover, until <a href="https://blog.mozilla.org/security/2021/01/07/encrypted-client-hello-the-future-of-esni-in-firefox/">encrypted client hello</a> becomes the standard, any third-party listener on the network can infer this as well.</p>

<h2 id="methods-for-signature-calculation">Methods for signature calculation</h2>

<h3 id="ja3">JA3</h3>

<p><a href="https://github.com/salesforce/ja3">JA3</a> is a popular method used to formalize the notion of a TLS fingerprint. It takes a Client Hello packet and produces a hash identifying the client.</p>

<p>JA3 works by concatenating multiple fields of the Client Hello and then hashing them. The fields are:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSLVersion,Cipher,SSLExtension,EllipticCurve,EllipticCurvePointFormat
</code></pre></div></div>

<p>For example, for a Chrome browser this would be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>771,39578-4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,23130-0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-39578-21,39578-29-23-24,0
</code></pre></div></div>

<p>This is then hashed with MD5 to produce the JA3 signature:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>e3501e1725c83830dd40f12930cc6eaa
</code></pre></div></div>

<p>JA3 is de-facto standard in this regard and has been integrated, for example, into Wireshark.</p>

<p>It is important to note that JA3 does not take into account all different parameteres in the Client Hello. This means that it is possible to have two different Client Hellos with the same JA3 signature<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>.</p>

<h3 id="ts1">TS1</h3>

<p><a href="https://github.com/lwthiker/ts1">TS1</a> is my take on creating a unique hash per TLS signature. It was inspired by JA3 but is more comprehensive in that it encodes all the parameters of the TLS Client Hello message. I’ve created and used it myself while working on <a href="https://github.com/lwthiker/curl-impersonate">curl-impersonate</a>.</p>

<p>TS1 encodes the parameters of the Client Hello message in JSON format according to certain rules:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="nl">"client_hello"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="nl">"ciphersuites"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">4865</span><span class="p">,</span><span class="w"> </span><span class="mi">4867</span><span class="p">,</span><span class="w"> </span><span class="mi">4866</span><span class="p">,</span><span class="w"> </span><span class="mi">49195</span><span class="p">,</span><span class="w"> </span><span class="mi">49199</span><span class="p">,</span><span class="w"> </span><span class="mi">52393</span><span class="p">,</span><span class="w"> </span><span class="mi">52392</span><span class="p">,</span><span class="w"> </span><span class="mi">49196</span><span class="p">,</span><span class="w"> </span><span class="mi">49200</span><span class="p">,</span><span class="w"> </span><span class="mi">49162</span><span class="p">,</span><span class="w"> </span><span class="mi">49161</span><span class="p">,</span><span class="w"> </span><span class="mi">49171</span><span class="p">,</span><span class="w"> </span><span class="mi">49172</span><span class="p">,</span><span class="w"> </span><span class="mi">156</span><span class="p">,</span><span class="w"> </span><span class="mi">157</span><span class="p">,</span><span class="w"> </span><span class="mi">47</span><span class="p">,</span><span class="w"> </span><span class="mi">53</span><span class="p">],</span><span class="w"> </span><span class="nl">"comp_methods"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="nl">"extensions"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"server_name"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"extended_master_secret"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"renegotiation_info"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="nl">"supported_groups"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">29</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span><span class="w"> </span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="mi">25</span><span class="p">,</span><span class="w"> </span><span class="mi">256</span><span class="p">,</span><span class="w"> </span><span class="mi">257</span><span class="p">],</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"supported_groups"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"ec_point_formats"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ec_point_formats"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"session_ticket"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"alpn_list"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"h2"</span><span class="p">,</span><span class="w"> </span><span class="s2">"http/1.1"</span><span class="p">],</span><span class="w"> </span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"application_layer_protocol_negotiation"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nl">"status_request_type"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"status_request"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nl">"sig_hash_algs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">1027</span><span class="p">,</span><span class="w"> </span><span class="mi">1283</span><span class="p">,</span><span class="w"> </span><span class="mi">1539</span><span class="p">,</span><span class="w"> </span><span class="mi">515</span><span class="p">],</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"delegated_credentials"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"key_shares"</span><span class="p">:</span><span class="w"> </span><span class="p">[{</span><span class="nl">"group"</span><span class="p">:</span><span class="w"> </span><span class="mi">29</span><span class="p">,</span><span class="w"> </span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">32</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"group"</span><span class="p">:</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span><span class="w"> </span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">65</span><span class="p">}],</span><span class="w"> </span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">107</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"keyshare"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nl">"supported_versions"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"TLS_VERSION_1_3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"TLS_VERSION_1_2"</span><span class="p">],</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"supported_versions"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">24</span><span class="p">,</span><span class="w"> </span><span class="nl">"sig_hash_algs"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="mi">1027</span><span class="p">,</span><span class="w"> </span><span class="mi">1283</span><span class="p">,</span><span class="w"> </span><span class="mi">1539</span><span class="p">,</span><span class="w"> </span><span class="mi">2052</span><span class="p">,</span><span class="w"> </span><span class="mi">2053</span><span class="p">,</span><span class="w"> </span><span class="mi">2054</span><span class="p">,</span><span class="w"> </span><span class="mi">1025</span><span class="p">,</span><span class="w"> </span><span class="mi">1281</span><span class="p">,</span><span class="w"> </span><span class="mi">1537</span><span class="p">,</span><span class="w"> </span><span class="mi">515</span><span class="p">,</span><span class="w"> </span><span class="mi">513</span><span class="p">],</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"signature_algorithms"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nl">"psk_ke_mode"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"psk_key_exchange_modes"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"length"</span><span class="p">:</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nl">"record_size_limit"</span><span class="p">:</span><span class="w"> </span><span class="mi">16385</span><span class="p">,</span><span class="w"> </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"record_size_limit"</span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"padding"</span><span class="p">}],</span><span class="w"> </span><span class="nl">"handshake_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS_VERSION_1_2"</span><span class="p">,</span><span class="w"> </span><span class="nl">"record_version"</span><span class="p">:</span><span class="w"> </span><span class="s2">"TLS_VERSION_1_0"</span><span class="p">,</span><span class="w"> </span><span class="nl">"session_id_length"</span><span class="p">:</span><span class="w"> </span><span class="mi">32</span><span class="p">}}</span><span class="w">
</span></code></pre></div></div>

<p>and then calculates its SHA1 hash to produce the TS1 signature:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>889b4383dcfee0d3dc4c472d3d40568028842b3e
</code></pre></div></div>

<p>Different clients will have different hashes, and the hashes can be easily saved in a database for easy comparison of clients’ signatures.</p>

<p>TS1 signatures encode more parameters than JA3, therefore they represent a more accurate picture of the client. Another advantage is that due to the use of JSON, it is future-proof to additional TLS extensions that are not yet defined, and which may hold crucial client-identifying information in the future.
The disadvantage of TS1 is that its JSON format is much more verbose than JA3’s simple format.</p>

<h2 id="where-is-tls-fingerprinting-being-used">Where is TLS fingerprinting being used?</h2>

<p>TLS fingerprinting is naturally used by anti-bot and anti-DDOS solutions to protect web pages against massive crawling or DDOS attacks. By checking if the client is a browser or a script (i.e. a bot), they can decide whether to allow the request, block it, or introduce an additional Javascript-based challenge to further test the client.</p>

<p>Another interesting use-case which got my attention, though I haven’t seen this by myself, is that of phishing campaigns. A phishing website will use TLS fingerprint to detect if the client is a browser or not. It will serve the phishy content to unsuspecting victims with a browser, but will block automatic crawling by security products attempting to identify phishing websites.</p>

<h2 id="controlling-your-tls-signature">Controlling your TLS signature</h2>

<p>Most of the parameters in the TLS client hello message are not controllable by scripts or command line tools.
In Python, for example, <a href="https://hussainaliakbar.github.io/restricting-tls-version-and-cipher-suites-in-python-requests-and-testing-with-wireshark/">you can control the cipher suites list</a>, but it pretty much ends there.
Even with that in place, the underlying TLS library may not send the exact list you specified, as is the case with Python and OpenSSL.</p>

<p>The best currently-available methods that I’m aware of to control the full TLS signature, are:</p>
<ul>
  <li><a href="https://github.com/puppeteer/puppeteer">Puppeteer</a>, which allows you to run a headless Chrome browser and control it with a script. By using a real browser, you get the TLS signature of that browser.</li>
  <li><a href="https://github.com/lwthiker/curl-impersonate">curl-impersonate</a>, my own fork of the popular <code class="language-plaintext highlighter-rouge">curl</code> tool with support for faking TLS signatures to impersonate a few popular browsers. It also comes with a fork of <code class="language-plaintext highlighter-rouge">libcurl</code>, called <code class="language-plaintext highlighter-rouge">libcurl-impersonate</code>, so you can programatically use it in your code. Another option is to inject <code class="language-plaintext highlighter-rouge">libcurl-impersonate</code> into an already running application using the regular <code class="language-plaintext highlighter-rouge">libcurl</code>. You can read about the technical aspects of curl-impersonate in my previous posts (<a href="/reversing/2022/02/17/curl-impersonate-firefox.html">part 1</a>, <a href="/reversing/2022/02/20/impersonating-chrome-too.html">part 2</a>), and find more documentation in the <a href="https://github.com/lwthiker/curl-impersonate">GitHub repository</a>. An advantage of curl-impersonate is that the correct HTTP/2 fingerprint will be used as well. More on this in the <a href="/networks/2022/06/17/http2-fingerprinting.html">next post</a>.</li>
  <li><a href="https://github.com/cucyber/JA3Transport">JA3Transport</a> is a Go library that intends to fake JA3 signatures. I didn’t test it myself.</li>
</ul>

<h2 id="whats-next-for-tls-fingerprinting">What’s next for TLS fingerprinting?</h2>

<p>TLS fingerprinting has become extremely common throughout the web, and while it is used for legitimate purposes such as blocking DDOS attacks, it is also making the web less open, less private and much more restrictive towards specific web clients.</p>

<p>It is my impression that current tools for faking a client’s TLS signature are still immature. Using <code class="language-plaintext highlighter-rouge">curl-impersonate</code> for example requires you to write your own C code or inject it into existing applications using libcurl.</p>

<p>The best solution would be for one of the TLS libraries to provide more fine-grained control for users. The kind of functionality that might be needed:</p>
<ul>
  <li>Allowing users to control the order TLS extensions.</li>
  <li>Allowing users to control the exact list of ciphers.</li>
  <li>Supporting the latest TLS extensions that some browsers use.</li>
</ul>

<p>When this happens, packages for popular programming language can emerge to take advantage of the functionality and to control their TLS signatures.</p>

<p><br /></p>

<hr />

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>The large number of available TLS extensions can be seen at <a href="https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml">https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Chrome 101, Firefox 100, Safari 15.4, Python 3.8.10 with OpenSSL 1.1.1f and the requests library. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:3" role="doc-endnote">
      <p>For example, the parameters inside the TLS compressed-certificate extension are not taken into account. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="networks" /><summary type="html"><![CDATA[In this two-part series of posts I would like to expand about server-side browser fingerprinting. Server-side fingerprinting is a collection of techniques used by web servers to identify which web client is making a request based on network parameters sent by the client. By web client I mean the type of client, as in which browser or CLI tool, and not a specific user like what a cookie identifies.]]></summary></entry><entry><title type="html">Firefox appears to be flagged as suspicious by Cloudflare</title><link href="https://lwthiker.com/opensource/2022/05/21/firefox-flagged-suspicious.html" rel="alternate" type="text/html" title="Firefox appears to be flagged as suspicious by Cloudflare" /><published>2022-05-21T15:30:00+00:00</published><updated>2022-05-21T15:30:00+00:00</updated><id>https://lwthiker.com/opensource/2022/05/21/firefox-flagged-suspicious</id><content type="html" xml:base="https://lwthiker.com/opensource/2022/05/21/firefox-flagged-suspicious.html"><![CDATA[<p><strong>Update</strong>: <a href="https://news.ycombinator.com/item?id=31459258">Cloudflare’s response</a> indicates that this is a customer-specific rule and not a global policy. They did not mention what kind of rule is triggering this behavior though.</p>

<p>It appears that Firefox is now flagged as “suspicious” by Cloudflare’s anti-bot protection. When you browse to certain websites hosted on Cloudflare’s CDN and using this service, Firefox is served back a Javascript challenge. This is how it looks like:</p>

<p><img src="/assets/img/checking_your_browser.png" alt="Checking your browser" /></p>

<p>You can test it yourself: Browse to <a href="https://www.g2.com">https://www.g2.com</a>, which is a software reviews website. If you use Chrome or Edge, you will get the site’s content. However, use Firefox and you’ll most likely be served the challenge instead (make sure to clear cookies before). This basically means you must have JS enabled to access the site and you will incur a 2-3 seconds delay before the content is served.</p>

<p>This is not a good prospect for the open-source browser. If this behavior gets adapted on more sites, we can expect even more users leaving Firefox, as every web access will take a few more seconds.</p>

<p>From a technical standpoint it doesn’t make sense either. I don’t see any reason to “suspect” Firefox is a bot. If anything, Chrome is probably being used for web scraping at a much higher rate through projects like <a href="https://github.com/puppeteer/puppeteer">Puppeteer</a>.</p>

<p>To be clear, I don’t believe this behavior is intentional on Cloudflare’s side. The way they identify which browser you are using is through a combination of <a href="https://httptoolkit.tech/blog/tls-fingerprinting-node-js/">TLS fingerprinting</a> and HTTP fingerprinting (on which I might write an extended explanation later on). What I believe to be happening is that Cloudflare whitelists the signatures of browsers with large-enough market share, and Firefox happens to fall below that threshold. Even if that is the case, I do expect Cloudflare to actively whitelist Firefox. Open-source browsers are an important part of the web and should not be treated differently than their closed-source counterparts.</p>]]></content><author><name></name></author><category term="opensource" /><summary type="html"><![CDATA[Update: Cloudflare’s response indicates that this is a customer-specific rule and not a global policy. They did not mention what kind of rule is triggering this behavior though.]]></summary></entry><entry><title type="html">Impersonating Chrome, too</title><link href="https://lwthiker.com/reversing/2022/02/20/impersonating-chrome-too.html" rel="alternate" type="text/html" title="Impersonating Chrome, too" /><published>2022-02-20T10:00:00+00:00</published><updated>2022-02-20T10:00:00+00:00</updated><id>https://lwthiker.com/reversing/2022/02/20/impersonating-chrome-too</id><content type="html" xml:base="https://lwthiker.com/reversing/2022/02/20/impersonating-chrome-too.html"><![CDATA[<p>This is a continuation of the <a href="/reversing/2022/02/17/curl-impersonate-firefox.html">previous post</a>. If you didn’t read it, please go ahead and read at least until the TL;DR section. In summary, various web services perform <em>TLS fingerprinting</em> to identify whether you run a real browser like Chrome or Firefox or whether it is a tool like <code class="language-plaintext highlighter-rouge">curl</code> or a Python script. I created <a href="https://github.com/lwthiker/curl-impersonate"><code class="language-plaintext highlighter-rouge">curl-impersonate</code></a>, a modified version of <code class="language-plaintext highlighter-rouge">curl</code> that performs TLS handshakes which are identical to Firefox’s, thereby tricking said services to believe it is a real browser.</p>

<p>After uploading the repository I posted it to Hacker News. On the <a href="https://news.ycombinator.com/item?id=30378562">thread</a> someone suggested that</p>
<blockquote>
  <p>They should really be impersonating Chrome. If this takes off, Firefox has such a small user share that I could see sites just banning Firefox altogether, like they do with Tor</p>
</blockquote>

<p>Challenge accepted!</p>

<h2 id="tldr">TL;DR</h2>

<ul>
  <li>I re-compiled <code class="language-plaintext highlighter-rouge">curl</code> with BoringSSL, Chrome’s TLS library.</li>
  <li>I tweaked curl’s TLS code to perform a similar TLS handshake to Chrome, enabling some Google-specific TLS extensions on the way.</li>
  <li>This still being detected by TLS fingerprinters, I had to dive deeper into the encrypted session.</li>
  <li>Two small but crucial differences in the HTTP/2 frames revealed further how those fingerprinters work.</li>
  <li>I then patched the HTTP/2 code as well to impersonate Chrome.</li>
  <li>You can find the updated <code class="language-plaintext highlighter-rouge">curl-impersonate</code>, with full Chrome 98 impersonation, in the <a href="https://github.com/lwthiker/curl-impersonate">GitHub repository</a>.</li>
</ul>

<p>Let’s look at the details.</p>

<h2 id="using-boringssl">Using BoringSSL</h2>
<p>The first part of impersonating a browser is using the same TLS library. Otherwise you are going to hit a wall of missing features and varying implementations as we shall see below. For Firefox I used NSS as mentioned in the previous post. Chrome uses <a href="https://boringssl.googlesource.com/boringssl/">BoringSSL</a>, described as “a fork of OpenSSL that is designed to meet Google’s needs.”. At first, looking at <a href="https://curl.se/docs/ssl-compared.html">Curl’s list of SSL libraries</a>, I didn’t find BoringSSL and concluded that it was not supported. But it really <em>is</em> supported. You just replace OpenSSL with BoringSSL at build time and it works:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell">./configure <span class="nt">--with-openssl</span><span class="o">=</span>/path/to/boringssl</code></pre></figure>

<p>The full build procedure is in the <a href="https://github.com/lwthiker/curl-impersonate/blob/main/chrome/Dockerfile">Dockerfile</a>.</p>

<h2 id="the-client-hello-message">The Client Hello message</h2>
<p>The first message sent by TLS clients is called Client Hello. It contains a list of parameters and extensions, all of which can be used to fingerprint the client. For example, the <a href="https://github.com/salesforce/ja3">ja3 method</a> calculates a hash of some of them to create a unique fingerprint for each client. Our goal here is to match curl’s Client Hello and make it completely identical to Chrome’s. Here’s the important part of Chrome’s Client Hello message (Chrome 98, Windows 10, non-incognito):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Handshake Protocol: Client Hello
    Handshake Type: Client Hello (1)
    Length: 508
    Version: TLS 1.2 (0x0303)
    Random: b46aad...
    Session ID Length: 32
    Session ID: 74c03b...
    Cipher Suites Length: 32
    Cipher Suites (16 suites)
    Compression Methods Length: 1
    Compression Methods (1 method)
    Extensions Length: 403
    Extension: Reserved (GREASE) (len=0)
    Extension: server_name (len=17)
    Extension: extended_master_secret (len=0)
    Extension: renegotiation_info (len=1)
    Extension: supported_groups (len=10)
    Extension: ec_point_formats (len=2)
    Extension: session_ticket (len=0)
    Extension: application_layer_protocol_negotiation (len=14)
    Extension: status_request (len=5)
    Extension: signature_algorithms (len=18)
    Extension: signed_certificate_timestamp (len=0)
    Extension: key_share (len=43)
    Extension: psk_key_exchange_modes (len=2)
    Extension: supported_versions (len=7)
    Extension: compress_certificate (len=3)
    Extension: application_settings (len=5)
    Extension: Reserved (GREASE) (len=1)
    Extension: padding (len=203)
</code></pre></div></div>
<p>The process of matching curl’s Client Hello consists of:</p>
<ul>
  <li>Matching the Ciphers Suites list, by using curl’s built-in <code class="language-plaintext highlighter-rouge">--ciphers</code> option.</li>
  <li>Enabling, disabling and modifying various extensions by modifying curl’s TLS code.</li>
</ul>

<p>I detailed some of the process in the <a href="/reversing/2022/02/17/curl-impersonate-firefox.html">previous post</a>, the main difference now being the use of BoringSSL instead of NSS. There were, however, some interesting Google-specific extensions to be dealt with.</p>

<h4 id="grease">GREASE</h4>
<p>As can be seen above, Chrome adds two extensions called <code class="language-plaintext highlighter-rouge">GREASE</code> before and after the main extension list. Firefox doesn’t do that, and in fact I don’t think NSS even supports it. The purpose of GREASE is to ensure TLS servers are future-proof by mixing in non-existent extensions, expecting the servers to ignore them until they become supported. There is a good explanation in this <a href="https://blog.cloudflare.com/why-tls-1-3-isnt-in-browsers-yet/">Cloudflare blog post</a>. To enable GREASE in <code class="language-plaintext highlighter-rouge">curl</code>, all that was needed was to call a single function:</p>

<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">SSL_CTX_set_grease_enabled</span><span class="p">(</span><span class="n">backend</span><span class="o">-&gt;</span><span class="n">ctx</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span></code></pre></figure>

<p>Because we are using the same BoringSSL implementation as Chrome, this adds the GREASE extensions at exactly the same place.</p>

<h4 id="compressed-certificates">Compressed Certificates</h4>

<p>Chrome adds the <code class="language-plaintext highlighter-rouge">compress_certificate</code> extension. This is how it looks like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Extension: compress_certificate (len=3)
    Type: compress_certificate (27)
    Length: 3
    Algorithms Length: 2
    Algorithm: brotli (2)
</code></pre></div></div>
<p>Chrome is telling the server here that it supports receiving certificates compressed using the Brotli compression algorithm. Brotli was developed at Google and is the <code class="language-plaintext highlighter-rouge">br</code> in the <code class="language-plaintext highlighter-rouge">Accept-Encoding: gzip, deflate, br</code> HTTP header that most browsers send out today. Going through the Chromium source code we find that this TLS extension is enabled in <a href="https://github.com/chromium/chromium/blob/c4d3c31083a2e1481253ff2d24298a1dfe19c754/net/ssl/cert_compression.cc">cert_compression.cc</a>. Again, it is a matter of a single line:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">SSL_CTX_add_cert_compression_alg</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">TLSEXT_cert_compression_brotli</span><span class="p">,</span>
                                 <span class="nb">nullptr</span> <span class="cm">/* compression not supported */</span><span class="p">,</span>
                                 <span class="n">DecompressBrotliCert</span><span class="p">);</span></code></pre></figure>

<p>Here <code class="language-plaintext highlighter-rouge">DecompressBrotliCert</code> is a simple proxy function between BoringSSL and the Brotli library. Copying the one-liner and the function over to curl enables the <code class="language-plaintext highlighter-rouge">compress_certificate</code> extension.</p>

<h4 id="alps">ALPS</h4>
<p>In the previous post I mentioned the <a href="https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation">ALPN extension</a> which allows the client and server to decide whether to use HTTP/1.1 or HTTP/2 during the TLS handshake. It’s being used by both Firefox and Chrome. Google had taken this one step forward and suggested the <a href="https://www.ietf.org/archive/id/draft-vvv-tls-alps-01.html">ALPS extension</a>, which allows the client to send its HTTP/2 SETTINGS during the TLS handshake (more about SETTINGS later). This is the <code class="language-plaintext highlighter-rouge">application_settings</code> extension in the Client Hello. As of this writing, it is a non-standard TLS extension, but Google being Google, they love experimenting with our browsers and Chrome already adds it to its extension list. <a href="https://github.com/chromium/chromium/commit/cc8598642336ec1fbf0eaf9c226f96b0d0acdaaf">Here is the commit</a> enabling ALPS in Chrome about a year ago.</p>

<p>In the end, it was again a matter of adding a one-liner to curl, and now curl supports ALPS as well<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>:</p>

<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">SSL_add_application_settings</span><span class="p">(</span><span class="n">backend</span><span class="o">-&gt;</span><span class="n">handle</span><span class="p">,</span> <span class="s">"h2"</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span></code></pre></figure>

<h4 id="comparing-the-tls-fingerprint">Comparing the TLS fingerprint</h4>
<p>By the end of this process, the Client Hello is identical. Here is Chrome’s TLS fingerprint from <a href="https://ja3er.com">ja3er.com</a>:
<img src="/assets/img/impersonatingch1.png" alt="Cipher Suite Comparison" /></p>

<p>And here is ours:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl-impersonate
    <span class="nt">--ciphers</span> TLS_AES_128_GCM_SHA256,TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256,ECDHE-ECDSA-AES128-GCM-SHA256,ECDHE-RSA-AES128-GCM-SHA256,ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384,ECDHE-ECDSA-CHACHA20-POLY1305,ECDHE-RSA-CHACHA20-POLY1305,ECDHE-RSA-AES128-SHA,ECDHE-RSA-AES256-SHA,AES128-GCM-SHA256,AES256-GCM-SHA384,AES128-SHA,AES256-SHA
    <span class="nt">-X</span> GET <span class="s1">'https://ja3er.com/json'</span> | jq <span class="nb">.</span>
<span class="o">{</span>
  <span class="s2">"ja3_hash"</span>: <span class="s2">"b32309a26951912be7dba376398abc3b"</span>,
  <span class="s2">"ja3"</span>: <span class="s2">"771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-21,29-23-24,0"</span>,
<span class="o">}</span></code></pre></figure>

<p>It’s identical.</p>

<h2 id="diving-deeper">Diving deeper</h2>
<p>Remarkably, even with an identical TLS fingerprint, <strong>Protectify</strong> was still able to identify and block our dear <code class="language-plaintext highlighter-rouge">curl-impersonate</code> (Protectify is the fake name of the company from the previous post). To understand how, we must dive deeper into the encrypted TLS session.</p>

<h3 id="decrypting-the-tls-session">Decrypting the TLS session</h3>
<p>To inspect what’s inside the TLS session we first need to capture it in Wireshark and decrypt it. This is easily done by defining the <code class="language-plaintext highlighter-rouge">SSLKEYLOGFILE</code> environment variable. Both Chrome and Firefox would then write a keylog file to the specified location. You can then feed this file to Wireshark and it would decrypt the session for you. Handy!</p>

<p>Here’s how a decrypted Chrome session to <a href="wikipedia.org">wikipedia.org</a> looks like:
<img src="/assets/img/impersonatingch2.png" alt="Decrypted TLS" /></p>

<p>The session begins as follows:</p>
<ul>
  <li>Chrome sends the Client Hello message.</li>
  <li>The server responds with the Server Hello message.</li>
  <li>The server sends its certificate and the TLS handshake is done.</li>
  <li>The client and server immediately begin an HTTP/2 session (Remember ALPN?).</li>
  <li>Chrome sends a <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame.</li>
  <li>Chrome sends a <code class="language-plaintext highlighter-rouge">HEADERS</code> frame with the <code class="language-plaintext highlighter-rouge">GET /</code> request.</li>
</ul>

<h4 id="the-settings-frame">The SETTINGS frame</h4>
<p>The <code class="language-plaintext highlighter-rouge">SETTINGS</code> frame is used to notify the server about a few HTTP/2 specific settings. Here’s how it looks like in Chrome:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stream: SETTINGS, Stream ID: 0, Length 30
    ...
    Settings - Header table size : 65536
    Settings - Max concurrent streams : 1000
    Settings - Initial Windows size : 6291456
    Settings - Max header list size : 262144
    Settings - Unknown (10858) : 1359919199
</code></pre></div></div>

<p>Therein lies our first problem. Curl’s <code class="language-plaintext highlighter-rouge">SETTINGS</code> look completely different:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stream: SETTINGS, Stream ID: 0, Length 18
    ...
    Settings - Max concurrent streams : 100
    Settings - Initial Windows size : 33554432
    Settings - Enable PUSH : 0
</code></pre></div></div>

<p>There are four notable differences:</p>
<ul>
  <li>Curl is sending different values for its settings.</li>
  <li>Curl is missing <code class="language-plaintext highlighter-rouge">Header table size</code> and <code class="language-plaintext highlighter-rouge">Max header list size</code>.</li>
  <li>Curl disables <a href="https://en.wikipedia.org/wiki/HTTP/2_Server_Push">HTTP/2 server push</a> because the command line curl doesn’t support it. This sticks out like a sore thumb in the SETTINGS frame.</li>
  <li>Chrome throws in a random setting in the end (Shown as <code class="language-plaintext highlighter-rouge">Unknown</code>). My guess is that this is another Google invention with similar purpose to TLS GREASE explained above.</li>
</ul>

<p>Patching curl’s <a href="https://github.com/curl/curl/blob/b8072192926b28239aee989ac551d4c8fa8e0cf8/lib/http2.c#L1192">relevant function</a> solves these issues and makes the SETTINGS frame look identical. Here’s the <a href="https://github.com/lwthiker/curl-impersonate/blob/main/chrome/patches/curl-http2-c.patch">full patch</a>.</p>

<h4 id="the-headers-frame">The HEADERS frame</h4>
<p>In HTTP/2, the <code class="language-plaintext highlighter-rouge">HEADERS</code> frame combines the method (e.g. GET), the URI and the HTTP headers all into a unified format. Here’s Chrome’s <code class="language-plaintext highlighter-rouge">HEADERS</code> frame:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stream: HEADERS, Stream ID: 1, Length 438, GET /
    ...
    Header: :method: GET
    Header: :authority: wikipedia.org
    Header: :scheme: https
    Header: :path: /
    ...
    (Regular HTTP headers follow)
</code></pre></div></div>
<p>It always begins with the pseudo-headers <code class="language-plaintext highlighter-rouge">:method</code>, <code class="language-plaintext highlighter-rouge">:authority</code>, <code class="language-plaintext highlighter-rouge">:scheme</code> and <code class="language-plaintext highlighter-rouge">:path</code> whose meaning is clear. But here’s the funny thing. curl sends them out in a different order! Look:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stream: HEADERS, Stream ID: 1, Length 434, GET /
    ...
    Header: :method: GET
    Header: :path: /
    Header: :scheme: https
    Header: :authority: wikipedia.org
    ...
</code></pre></div></div>
<p>This is completely fine from an HTTP standpoint, but is being leveraged to fingerprint our client. curl, Firefox, Chrome - each sends them out in a different order.</p>

<p>You can’t control the order of the pseudo-headers from the curl command line. It’s hard-coded into curl’s code, and it’s always the same. Luckily, <a href="https://github.com/lwthiker/curl-impersonate/blob/main/chrome/patches/curl-http2-c.patch">the fix</a> is simple and involves re-ordering them into the desired order.</p>

<h2 id="concluding">Concluding</h2>

<p>After matching the TLS signature and the HTTP/2 signature, <code class="language-plaintext highlighter-rouge">curl-impersonate</code> now behaves similarly enough to Chrome to trick TLS fingerprinters. In the repository you may find <a href="https://github.com/lwthiker/curl-impersonate/blob/main/chrome/curl_chrome98">curl_chrome98</a>, a wrapper script that launches <code class="language-plaintext highlighter-rouge">curl-impersonate</code> with all the correct headers and flags to make it impersonate Chrome 98 on a Windows 10 machine.</p>

<p>Impersonating browsers is an endless cat-and-mouse game. The rapid release of new browser versions means TLS signatures change by the month. Tomorrow Chrome may come up with another Google-specific extension, or start using <a href="https://blog.mozilla.org/security/2021/01/07/encrypted-client-hello-the-future-of-esni-in-firefox/">Encrypted Client Hello</a>, or even turn on <a href="https://en.wikipedia.org/wiki/HTTP/3">HTTP3</a> by default. Each such change will require a different set of modifications for <code class="language-plaintext highlighter-rouge">curl-impersonate</code> to work.</p>

<p><br /></p>

<hr />

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>curl adds the ALPS extension to the Client Hello. For ALPS to fully work the server needs to respond with an encrypted ALPS extension, and the client to send its application settings back (e.g. the HTTP2 SETTINGS frame). I couldn’t test how curl behaves in this situation as no server seems to support it right now, not even google.com. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="reversing" /><summary type="html"><![CDATA[This is a continuation of the previous post. If you didn’t read it, please go ahead and read at least until the TL;DR section. In summary, various web services perform TLS fingerprinting to identify whether you run a real browser like Chrome or Firefox or whether it is a tool like curl or a Python script. I created curl-impersonate, a modified version of curl that performs TLS handshakes which are identical to Firefox’s, thereby tricking said services to believe it is a real browser.]]></summary></entry><entry><title type="html">Making curl impersonate Firefox</title><link href="https://lwthiker.com/reversing/2022/02/17/curl-impersonate-firefox.html" rel="alternate" type="text/html" title="Making curl impersonate Firefox" /><published>2022-02-17T16:00:00+00:00</published><updated>2022-02-17T16:00:00+00:00</updated><id>https://lwthiker.com/reversing/2022/02/17/curl-impersonate-firefox</id><content type="html" xml:base="https://lwthiker.com/reversing/2022/02/17/curl-impersonate-firefox.html"><![CDATA[<p><strong>Update</strong>: The second part about impersonating Chrome <a href="/reversing/2022/02/20/impersonating-chrome-too.html">is up</a>.</p>

<p>In the <a href="/reversing/2022/02/12/analyzing-stock-exchange-api.html">last post</a> I analyzed an API used by a website to fetch data and display it to the user. I did that in order to automate fetching that same data once a day. The API required customized HTTP headers which I guess were some sort of bot protection. This time I faced a much more sophisticated mechanism: a commercial bot protection solution.</p>

<p>Bot protections are designed to protect websites against web scraping. There are a lot of commercial solutions available by known companies. Here I was getting blocked by one of them, let’s call the company by the fake name <strong>Protectify</strong>.</p>

<p>My motivation was similar to the last post. I wanted to perform a single GET request to a webpage automatically once a day. When using the browser, the website immediately returns the correct content. However, when using <code class="language-plaintext highlighter-rouge">curl</code> or a Python script to perform the exact same GET request, we get back:</p>

<figure class="highlight"><pre><code class="language-http" data-lang="http"><span class="k">HTTP</span><span class="o">/</span><span class="m">1.1</span> <span class="m">503</span> <span class="ne">Service Temporarily Unavailable</span>
<span class="s">...</span>
<span class="na">Server</span><span class="p">:</span> <span class="s">protectify</span>
<span class="s">...</span>

Checking your browser before accessing www.secured-by-protectify.com
This process is automatic. Your browser will redirect to your requested content shortly.</code></pre></figure>

<p>The returned HTML also contains some obfuscated Javascript code. Basically what’s happening is that the website is served by Protectify’s servers. They somehow detected the use of an automated tool to perform the HTTP request, and served us a Javascript-based challenge that only a real browser would be able to solve.</p>

<p>The data I was trying to fetch was publicly available information which could be taken from other sources. However, this piqued my interest. <strong>A real browser does not get the JS challenge, but is immediately served the real content. How could Protectify know that I was using <code class="language-plaintext highlighter-rouge">curl</code> to access the website?</strong></p>

<h2 id="tldr">TL;DR</h2>
<ul>
  <li>Protectify’s servers fingerprint the HTTP client used (e.g. browser, curl) before serving back content.</li>
  <li>They use a variety of parameters, most notably the TLS handshake and the HTTP headers.</li>
  <li>In case your fingerprint does not match that of a known browser, the Javascript challenge is served instead of the real content.</li>
</ul>

<p>To bypass it,</p>
<ul>
  <li>I compiled a special version of <code class="language-plaintext highlighter-rouge">curl</code> that behaves, network-wise, identically to Firefox.  I called it <code class="language-plaintext highlighter-rouge">curl-impersonate</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">curl-impersonate</code> is able to trick Protectify and gets served the real content.</li>
  <li>You can find a Docker image that compiles it in <a href="https://github.com/lwthiker/curl-impersonate">this repository</a>.</li>
</ul>

<p>This was done in a very hacky way, but I hope the findings below could be turned into real project. Imagine that you could run:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl --impersonate ff95
</code></pre></div></div>
<p>and it would behave exactly like Firefox 95. It can then be wrappped with a nice Python library.</p>

<p>Anyway, here are the technical details.</p>

<h2 id="the-technical-details">The technical details</h2>
<p>Let’s try to understand how Protectify identifies that we are a bot. At first I tried to send the exact same HTTP headers that Firefox sends. I used Firefox 95 on a Windows virtual machine to see what headers are sent. I then ran <code class="language-plaintext highlighter-rouge">curl</code> with the exact same headers:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl <span class="s1">'https://secured-by-protectify.com'</span>
    <span class="nt">-H</span> <span class="s1">'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Accept-Language: en-US,en;q=0.5'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Accept-Encoding: gzip, deflate, br'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Connection: keep-alive'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Upgrade-Insecure-Requests: 1'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Dest: document'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Mode: navigate'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Sec-Fetch-Site: none'</span> <span class="se">\</span>
    <span class="nt">-H</span> <span class="s1">'Sec-Fetch-User: ?1'</span></code></pre></figure>

<p>This doesn’t work. We get back <code class="language-plaintext highlighter-rouge">HTTP/1.1 503 Service Temporarily Unavailable</code>.</p>

<p>There is also an open-source Python package which claims to “bypass Protectify’s anti-bot page”. It didn’t work with this site as well.</p>

<h3 id="the-tls-handshake">The TLS handshake</h3>

<p>When an HTTP client opens a connection to a website with SSL/TLS enabled (i.e. https://…) it first performs a TLS handshake. The handshake’s purpose is to verify the other side’s authenticity and establish the encrypted connection. The first message sent by the client is called “Client Hello” and it contains quite a lot of TLS parameters. Here is a Wireshark capture from a regular <code class="language-plaintext highlighter-rouge">curl</code> invocation:</p>

<p><img src="/assets/img/impersonatingff1.png" alt="Client Hello" /></p>

<p>I’m far from a TLS expert, but it is clear that in this message alone there is a myriad of parameters, extensions and configurations which are sent by our client. Each TLS client will send a different “Client Hello” message, and it has been known for a long time that it can be used to identify which browser or tool initiated the connection. See, for example, the <a href="https://github.com/salesforce/ja3">ja3 project</a>.</p>

<h3 id="the-cipher-suites-list">The “Cipher Suites” list</h3>

<p>Part of the “Client Hello” message is the Cipher Suites list, visible above. It indicates to the server what encryption methods the client supports. This is how curl’s cipher suite looks like by default:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cipher Suites (31 suites)
    Cipher Suite: TLS_AES_256_GCM_SHA384 (0x1302)
    Cipher Suite: TLS_CHACHA20_POLY1305_SHA256 (0x1303)
    Cipher Suite: TLS_AES_128_GCM_SHA256 (0x1301)
    Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (0xc02c)
    Cipher Suite: TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (0xc030)
    Cipher Suite: TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 (0x009f)
    Cipher Suite: TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (0xcca9)
    ...
</code></pre></div></div>

<p>Notably, curl sends 31 different possible ciphers. Compare it to Firefox’s 17, which are also ordered differently:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cipher Suites (17 suites)
    Cipher Suite: TLS_AES_128_GCM_SHA256 (0x1301)
    Cipher Suite: TLS_CHACHA20_POLY1305_SHA256 (0x1303)
    Cipher Suite: TLS_AES_256_GCM_SHA384 (0x1302)
    Cipher Suite: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 (0xc02b)
    Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f)
    Cipher Suite: TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 (0xcca9)
    Cipher Suite: TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 (0xcca8)
    ...
</code></pre></div></div>

<p>It is highly likely that Protectify uses this list to detect known browsers. Hence my first attempt was to cause <code class="language-plaintext highlighter-rouge">curl</code> to use the same cipher suite as Firefox. I converted the list to OpenSSL’s format using <a href="https://wiki.mozilla.org/Security/Cipher_Suites">this reference</a> and tried my luck with the <code class="language-plaintext highlighter-rouge">--ciphers</code> option:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl <span class="s1">'https://secured-by-protectify.com'</span>
    <span class="nt">--ciphers</span> TLS_AES_128_GCM_SHA256,TLS_CHACHA20_POLY1305_SHA256,TLS_AES_256_GCM_SHA384,ECDHE-ECDSA-AES128-GCM-SHA256,ECDHE-RSA-AES128-GCM-SHA256,ECDHE-ECDSA-CHACHA20-POLY1305,ECDHE-RSA-CHACHA20-POLY1305,ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384,ECDHE-ECDSA-AES256-SHA,ECDHE-ECDSA-AES128-SHA,ECDHE-RSA-AES128-SHA,ECDHE-RSA-AES256-SHA,AES128-GCM-SHA256,AES256-GCM-SHA384,AES128-SHA,AES256-SHA
    <span class="nt">-H</span> <span class="s1">'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0'</span> <span class="se">\</span>
    ...</code></pre></figure>

<p>well, it fails. <code class="language-plaintext highlighter-rouge">503 Service Temporarily Unavailable</code> again. Looking at Wireshark, the cipher suite contains 18 ciphers, even though we requested only 17. <a href="https://www.openssl.org/">OpenSSL</a>, the library curl uses by default for TLS, had automatically added the following cipher:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Cipher Suite: TLS_EMPTY_RENEGOTIATION_INFO_SCSV (0x00ff)
</code></pre></div></div>

<p>This behavior is <a href="https://wiki.openssl.org/index.php/SSL_MODE_SEND_FALLBACK_SCSV">documented by OpenSSL</a> but I could not find a way to disable it. This makes it extremely easy to detect OpenSSL clients. <code class="language-plaintext highlighter-rouge">curl</code> and Python use OpenSSL, but no major browser does. We’ll have to choose a different route.</p>

<h3 id="using-nss">Using NSS</h3>

<p>Firefox does not use OpenSSL. It uses <a href="https://firefox-source-docs.mozilla.org/security/nss/index.html">NSS</a>, another library for TLS communications. Luckily, <code class="language-plaintext highlighter-rouge">curl</code> can be compiled against a large range of TLS libraries, NSS included. So I compiled curl against NSS instead of OpenSSL. This was pretty techinical and took a while to figure out. You can find the full build procedure at the <a href="https://github.com/lwthiker/curl-impersonate">repository</a>. The resulting binary I named <code class="language-plaintext highlighter-rouge">curl-impersonate</code>.</p>

<p>With this in hand, I converted once more the cipher list into the right format, which can be found in <a href="https://github.com/curl/curl/blob/master/lib/vtls/nss.c">this curl source file</a>. Running our new <code class="language-plaintext highlighter-rouge">curl-impersonate</code>:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl-impersonate <span class="s1">'https://secured-by-protectify.com'</span>
    <span class="nt">--ciphers</span> aes_128_gcm_sha_256,chacha20_poly1305_sha_256,aes_256_gcm_sha_384,ecdhe_ecdsa_aes_128_gcm_sha_256,ecdhe_rsa_aes_128_gcm_sha_256,ecdhe_ecdsa_chacha20_poly1305_sha_256,ecdhe_rsa_chacha20_poly1305_sha_256,ecdhe_ecdsa_aes_256_gcm_sha_384,ecdhe_rsa_aes_256_gcm_sha_384,ecdhe_ecdsa_aes_256_sha,ecdhe_ecdsa_aes_128_sha,ecdhe_rsa_aes_128_sha,ecdhe_rsa_aes_256_sha,rsa_aes_128_gcm_sha_256,rsa_aes_256_gcm_sha_384,rsa_aes_128_sha,rsa_aes_256_sha
    <span class="nt">-H</span> <span class="s1">'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:95.0) Gecko/20100101 Firefox/95.0'</span> <span class="se">\</span>
    <span class="nt">-H</span> ...</code></pre></figure>

<p>and… it fails again. However, looking at Wireshark, the Cipher Suite option now matches exactly the one Firefox sends. Left is <code class="language-plaintext highlighter-rouge">curl-impersonate</code>, right is Firefox:</p>

<p><img src="/assets/img/impersonatingff3.png" alt="Cipher Suite Comparison" /></p>

<p>So we are in the right direction.</p>

<h3 id="the-rest-of-the-client-hello-message">The rest of the Client Hello message</h3>
<p>The Cipher Suites is just one part of the Client Hello message. Most importantly, the Client Hello contains a list of TLS extensions. Each client produces a different set of extensions by default. Anti-bot mechanisms use this to identify which HTTP client was used. The goal here was to make <code class="language-plaintext highlighter-rouge">curl-impersonate</code> produce the exact same extension list as Firefox. I will detail some of the process. The bottom line is that by playing with curl’s source code, and putting in the right modifications, I managed to make its Client Hello message look <em>exactly</em> like Firefox’s.</p>

<p>Here is the Client Hello message that Firefox sends by default (Firefox 95, Windows, non-incognito):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Handshake Protocol: Client Hello
    Handshake Type: Client Hello (1)
    Length: 508
    Version: TLS 1.2 (0x0303)
    ...
    Session ID Length: 32
    Session ID: 22de422dd343bb2bccead1e060098037ae5793bae952b20c…
    ...
    Extensions Length: 401
    Extension: server_name (len=17)
    Extension: extended_master_secret (len=0)
    Extension: renegotiation_info (len=1)
    Extension: supported_groups (len=14)
    Extension: ec_point_formats (len=2)
    Extension: session_ticket (len=0)
    Extension: application_layer_protocol_negotiation (len=14)
    Extension: status_request (len=5)
    Extension: delegated_credentials (len=10)
    Extension: key_share (len=107)
    Extension: supported_versions (len=5)
    Extension: signature_algorithms (len=24)
    Extension: psk_key_exchange_modes (len=2)
    Extension: record_size_limit (len=2)
    Extension: padding (len=138)
</code></pre></div></div>

<p>Here are some of the notable changes I made to curl so that it sends the exact same message.</p>

<h4 id="alpn-and-http2">ALPN and HTTP2</h4>
<p>The presence of the <code class="language-plaintext highlighter-rouge">application_layer_protocol_negotiation</code> extension can be seen above. This is known as <a href="https://en.wikipedia.org/wiki/Application-Layer_Protocol_Negotiation">ALPN</a>. This extension is used by browsers to negotiate whether to use HTTP/1.1 or HTTP/2. By doing it as part of the TLS handshake, the browser saves a few round-trips which would otherwise happen only after the TLS session has been established. The extension’s contents look like the following:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Extension: application_layer_protocol_negotiation (len=14)
    Type: application_layer_protocol_negotiation (16)
    Length: 14
    ALPN Extension Length: 12
    ALPN Protocol
        ALPN string length: 2
        ALPN Next Protocol: h2
        ALPN string length: 8
        ALPN Next Protocol: http/1.1
</code></pre></div></div>

<p>Here Firefox tells the server that it supports both HTTP/2 (<code class="language-plaintext highlighter-rouge">h2</code>) and HTTP/1.1 (<code class="language-plaintext highlighter-rouge">http/1.1</code>).</p>

<p>To reproduce this behavior, I:</p>
<ul>
  <li>Compiled curl with <a href="https://nghttp2.org/">nghttp2</a>, the low-level library that provides the HTTP/2 implementation.</li>
  <li>Made a small modification to Curl’s code, since it was sending <code class="language-plaintext highlighter-rouge">h2</code> and <code class="language-plaintext highlighter-rouge">http/1.1</code> in reverse order.</li>
  <li>Launched curl with the <code class="language-plaintext highlighter-rouge">--http2</code> flag.</li>
</ul>

<h4 id="a-few-other-extensions">A few other extensions</h4>
<p>Firefox adds the <code class="language-plaintext highlighter-rouge">status_request</code> and <code class="language-plaintext highlighter-rouge">delegated_credentials</code> extensions as can be seen above. I don’t know what they do, but curl wasn’t sending them. Here the solution was to look at the Firefox source code. Mozilla provides <a href="https://searchfox.org/">searchfox</a>, a whole site dedicated to searching the Firefox source code. It’s great! The two important files are <a href="https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSIOLayer.cpp">nsNSSIOLayer.cpp</a> and <a href="https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSComponent.cpp">nsNSSComponent.cpp</a>. Searching around I found the following two snippets:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// CommonInit() @ nsNSSComponent.cpp</span>
  <span class="n">SSL_OptionSetDefault</span><span class="p">(</span>
      <span class="n">SSL_ENABLE_DELEGATED_CREDENTIALS</span><span class="p">,</span>
      <span class="n">Preferences</span><span class="o">::</span><span class="n">GetBool</span><span class="p">(</span><span class="s">"security.tls.enable_delegated_credentials"</span><span class="p">,</span>
                           <span class="n">DELEGATED_CREDENTIALS_ENABLED_DEFAULT</span><span class="p">));</span></code></pre></figure>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// nsSSLIOLayerSetOptions() @ nsNSSIOLayer.cpp</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">SECSuccess</span> <span class="o">!=</span> <span class="n">SSL_OptionSet</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">SSL_ENABLE_OCSP_STAPLING</span><span class="p">,</span> <span class="n">enabled</span><span class="p">))</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">NS_ERROR_FAILURE</span><span class="p">;</span>
  <span class="p">}</span></code></pre></figure>

<p>So Firefox turns on some specific SSL options called <code class="language-plaintext highlighter-rouge">SSL_ENABLE_DELEGATED_CREDENTIALS</code> and <code class="language-plaintext highlighter-rouge">SSL_ENABLE_OCSP_STAPLING</code> . Without really understanding what’s their purpose, I added similar snippets to curl, and now it sends the desired extensions in the Client Hello. I continued this process for 7 or 8 extensions in total. Some were missing, some were configured differently, and it took some tinkering to figure everything out. The full patch can be found at the <a href="https://github.com/lwthiker/curl-impersonate/blob/main/curl-lib-nss.patch">repo</a>.</p>

<h4 id="session-id">Session ID</h4>
<p>TLS Session IDs are another optimization mechanism that saves the browser from re-doing a full TLS handshake. Quoting from <a href="https://hpbn.co/transport-layer-security-tls/#tls-session-resumption">this book</a>:</p>
<blockquote>
  <p>… the client can include the session ID in the ClientHello message to indicate to the server that it still remembers the negotiated cipher suite and keys from previous handshake and is able to reuse them. In turn, if the server is able to find the session parameters associated with the advertised ID in its cache, then an abbreviated handshake (Figure 4-3) can take place.</p>
</blockquote>

<p>But here is the curious thing: Firefox <em>always</em> includes a session ID, even when connecting to a never-visited-before site. This is how it looks:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    Session ID Length: 32
    Session ID: 22de422dd343bb2bccead1e060098037ae5793bae952b20c…
</code></pre></div></div>
<p>while curl’s is just empty:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    Session ID Length: 0
</code></pre></div></div>

<p>This took quite a deep look in the NSS/Firefox source code to figure out. The relevant function is <a href="https://searchfox.org/mozilla-central/search?q=symbol:ssl3_CreateClientHelloPreamble&amp;redirect=false">ssl3_CreateClientHelloPreamble</a> which builds the Client Hello message. Under certain circumstances, it adds a fake session ID:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="p">...</span>
<span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">ss</span><span class="o">-&gt;</span><span class="n">opt</span><span class="p">.</span><span class="n">enableTls13CompatMode</span> <span class="o">&amp;&amp;</span> <span class="o">!</span><span class="n">IS_DTLS</span><span class="p">(</span><span class="n">ss</span><span class="p">))</span> <span class="p">{</span>
    <span class="cm">/* We're faking session resumption, so rather than create new
     * randomness, just mix up the client random a little. */</span>
    <span class="n">PRUint8</span> <span class="n">buf</span><span class="p">[</span><span class="n">SSL3_SESSIONID_BYTES</span><span class="p">];</span>
    <span class="n">ssl_MakeFakeSid</span><span class="p">(</span><span class="n">ss</span><span class="p">,</span> <span class="n">buf</span><span class="p">);</span>
    <span class="n">rv</span> <span class="o">=</span> <span class="n">sslBuffer_AppendVariable</span><span class="p">(</span><span class="o">&amp;</span><span class="n">constructed</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">SSL3_SESSIONID_BYTES</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>

<p>I don’t really understand why. If anyone does, please let me know<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. To enable similar behavior in <code class="language-plaintext highlighter-rouge">curl-impersonate</code> I had to turn on “TLS1.3 compat mode” (which can be seen in the <code class="language-plaintext highlighter-rouge">if</code> condition above). Firefox does this as well. This is from the Firefox code:</p>

<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="c1">// nsSSLIOLayerSetOptions() @ nsNSSIOLayer.cpp</span>

  <span class="c1">// Set TLS 1.3 compat mode.</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">SECSuccess</span> <span class="o">!=</span> <span class="n">SSL_OptionSet</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">SSL_ENABLE_TLS13_COMPAT_MODE</span><span class="p">,</span> <span class="n">PR_TRUE</span><span class="p">))</span> <span class="p">{</span>
      <span class="p">...</span></code></pre></figure>

<p>Putting a similar call in <code class="language-plaintext highlighter-rouge">curl-impersonate</code> makes it send fake sesssion IDs a well.</p>

<h2 id="the-result">The result</h2>
<p>The resulting curl binary, after all source-code modifications and using the right flags, sends a TLS Client Hello message that looks <em>exactly</em> like the one Firefox sends. Here is a side-by-side comparison:
<img src="/assets/img/impersonatingff2.png" alt="Client Hello" /></p>

<p>I can’t tell the difference, and Protectify can’t either. It bypasses the bot protection entirely.</p>

<p><a href="https://github.com/lwthiker/curl-impersonate">This repository</a> contains a <a href="https://github.com/lwthiker/curl-impersonate/blob/main/firefox/Dockerfile">Dockerfile</a> that will build it for you. The resulting image includes:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">curl-impersonate</code>, a modified curl binary with all the required TLS tweaks.</li>
  <li><code class="language-plaintext highlighter-rouge">curl_ff95</code>, a wrapper bash script that will launch <code class="language-plaintext highlighter-rouge">curl-impersonate</code> with the correct parameters to make it look like Firefox 95 on Windows.</li>
</ul>

<h2 id="concluding">Concluding</h2>

<p>The modified curl behaves like a real browser, at least from the TLS viewpoint. It bypasses this specific company’s bot protection mechanism.</p>

<p>Honestly, that company did a pretty great job there. If your TLS handshake and HTTP headers don’t exactly match that of a real browser, you get blocked. If you use a real browser, you don’t notice anything. I would use their solution if I needed one.</p>

<p>Remember that this was just one bot protection mechanism. There are others which are more aggressive. I don’t expect the above to work for you if you do massive web scraping. For fetching a single page once a day it works well, at least until they figure it out and update their bot protection to use other tricks.</p>

<p><br /></p>

<hr />

<p><br /></p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Update: I now understand that this was implemented as a bridge for adoption of TLS 1.3. More information in this <a href="https://blog.cloudflare.com/why-tls-1-3-isnt-in-browsers-yet/">Cloudflare blog post</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name></name></author><category term="reversing" /><summary type="html"><![CDATA[Update: The second part about impersonating Chrome is up.]]></summary></entry><entry><title type="html">Analyzing a stock exchange’s API</title><link href="https://lwthiker.com/reversing/2022/02/12/analyzing-stock-exchange-api.html" rel="alternate" type="text/html" title="Analyzing a stock exchange’s API" /><published>2022-02-12T10:30:00+00:00</published><updated>2022-02-12T10:30:00+00:00</updated><id>https://lwthiker.com/reversing/2022/02/12/analyzing-stock-exchange-api</id><content type="html" xml:base="https://lwthiker.com/reversing/2022/02/12/analyzing-stock-exchange-api.html"><![CDATA[<p>This was a fun afternoon reverse engineering project so I figured I’d write a bit about it.</p>

<p>I’m developing a web app, <a href="https://pumbaa.app">Pumbaa Backtester</a>, which is a small tool to simulate the historical performance of index-based investments. As part of the development I wanted to fetch long-term historical data for an ETF traded at a medium-size stock exchange. I won’t write exactly which one, but if you are curious you’ll figure it out.</p>

<p>Each day a closing price for the ETF is determined, which is pretty much like the price of a stock at the end of the trading day. What I needed are closing prices since the ETF was created 22 years ago. Browsing a bit at the stock exchange’s site I got to the following form:</p>

<p><img src="/assets/img/stockexchange1.png" alt="Historical price form" /></p>

<p>Great! This gives the data I want. The goal is to automate fetching these prices - I want it to be done automatically once a day. So let’s fire up Firefox network monitor (Ctrl+Shift+E) and see what happens when we press “Search”:</p>

<p><img src="/assets/img/stockexchange2.png" alt="Firefox network monitor" /></p>

<p>Looks simple enough - an API with the parameters <code class="language-plaintext highlighter-rouge">isin</code> (unique id of the ETF), <code class="language-plaintext highlighter-rouge">minDate</code> and <code class="language-plaintext highlighter-rouge">maxDate</code>.</p>

<h3 id="first-attempts">First attempts</h3>

<p>If we attempt to access the API with curl:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl <span class="nt">-X</span> GET <span class="nt">-G</span>   <span class="se">\</span>
    <span class="s1">'https://api.stock-exchange.com/v1/data/price_history'</span>  <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"limit=50"</span>                                       <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"offset=0"</span>                                       <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"isin=</span><span class="nv">$ISIN</span><span class="s2">"</span>                                     <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"minDate=2021-02-10"</span>                             <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"maxDate=2022-02-10"</span>                             <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"cleanSplit=false"</span>                               <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"cleanPayout=false"</span>                              <span class="se">\</span>
        <span class="nt">-d</span> <span class="s2">"cleanSubscriptionRights=false"</span>
<span class="o">{}</span></code></pre></figure>

<p>we get back an empty JSON response. At this point the most likely possibility is that we are missing one of the HTTP headers, it can be a Cookie header or something else. Looking at the original request’s headers, everything is quite standard except for the trio <code class="language-plaintext highlighter-rouge">Client-Date</code>, <code class="language-plaintext highlighter-rouge">X-Client-TraceId</code> and <code class="language-plaintext highlighter-rouge">X-Security</code>:</p>

<figure class="highlight"><pre><code class="language-http" data-lang="http"><span class="err">Client-Date: 2022-02-12T08:58:52.208Z
X-Client-TraceId: bbbfec1ad15ca1e16cd72fba9e8a7241
X-Security: 185111eb1d17ea0bf0928f2655d05254</span></code></pre></figure>

<p>These are not documented on <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers">MDN</a> so they must be something unique to this API. You could wonder if we could just send the exact same headers again, and yes it works for a few minutes, but then stops working. We’ll have to find out the logic behind them.</p>

<p><code class="language-plaintext highlighter-rouge">Client-Date</code> is simple enough, it’s just the current time. The other two are 16-byte hex encoded strings, so maybe they are just random UUIDs? Let’s try:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span>curl <span class="nt">-X</span> GET <span class="nt">-G</span>   <span class="se">\</span>
    <span class="s1">'https://api.stock-exchange.com/v1/data/price_history'</span>  <span class="se">\</span>
        <span class="nt">-H</span> <span class="s2">"Client-Date: 2022-02-12T08:58:52.208Z"</span>          <span class="se">\</span>
        <span class="nt">-H</span> <span class="s2">"X-Client-TraceId: </span><span class="si">$(</span>uuidgen <span class="nt">-r</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'-'</span><span class="si">)</span><span class="s2">"</span>    <span class="se">\</span>
        <span class="nt">-H</span> <span class="s2">"X-Security: </span><span class="si">$(</span>uuidgen <span class="nt">-r</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'-'</span><span class="si">)</span><span class="s2">"</span>          <span class="se">\</span>
        <span class="nt">-d</span> <span class="s1">'limit=50'</span>                                       <span class="se">\</span>
         ...
<span class="o">{}</span></code></pre></figure>

<p>Nope, another empty JSON. There must be some logic then that generates these headers in Javascript.</p>

<h3 id="finding-the-origin">Finding the origin</h3>
<p>Searching for the string <code class="language-plaintext highlighter-rouge">X-Client-TraceId</code> through the JS scripts that the page uses, we find the culprit:</p>

<p><img src="/assets/img/stockexchange3.png" alt="Javascript" /></p>

<p>The script <code class="language-plaintext highlighter-rouge">main-es2015.3f13e42ead3dc41c6dc3.js</code> is a one-line, minified script, probably generated by <a href="https://webpack.js.org/">webpack</a>. Why would a page with a single form need 3MB of Javascript is really beyond me. Anyway, after <a href="https://github.com/beautify-web/js-beautify">beautifying</a> it we can look at the snippet that generates the three headers:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">class</span> <span class="nx">o</span> <span class="p">{</span>
    <span class="kd">static</span> <span class="nx">generateHeaders</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="p">{</span>
        <span class="kd">const</span> <span class="nx">e</span> <span class="o">=</span> <span class="nx">i</span><span class="p">().</span><span class="nx">toISOString</span><span class="p">();</span>
        <span class="kd">let</span> <span class="nx">n</span> <span class="o">=</span> <span class="nx">e</span> <span class="o">+</span> <span class="nx">t</span> <span class="o">+</span> <span class="nx">r</span><span class="p">.</span><span class="nx">N</span><span class="p">.</span><span class="nx">tracing</span><span class="p">.</span><span class="nx">salt</span><span class="p">;</span>
        <span class="k">return</span> <span class="nx">n</span> <span class="o">=</span> <span class="nx">s</span><span class="p">.</span><span class="nx">V</span><span class="p">.</span><span class="nx">hashStr</span><span class="p">(</span><span class="nx">n</span><span class="p">).</span><span class="nx">toString</span><span class="p">(),</span> <span class="p">{</span>
            <span class="dl">"</span><span class="s2">Client-Date</span><span class="dl">"</span><span class="p">:</span> <span class="nx">e</span><span class="p">,</span>
            <span class="dl">"</span><span class="s2">X-Client-TraceId</span><span class="dl">"</span><span class="p">:</span> <span class="nx">n</span><span class="p">,</span>
            <span class="dl">"</span><span class="s2">X-Security</span><span class="dl">"</span><span class="p">:</span> <span class="nx">s</span><span class="p">.</span><span class="nx">V</span><span class="p">.</span><span class="nx">hashStr</span><span class="p">(</span><span class="nx">i</span><span class="p">().</span><span class="nx">format</span><span class="p">(</span><span class="dl">"</span><span class="s2">YYYYMMDDHHmm</span><span class="dl">"</span><span class="p">)).</span><span class="nx">toString</span><span class="p">()</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<p>At first I tried to approach this like a programmer, understanding where each variable comes from. But in a 90k-line script where everything is called <code class="language-plaintext highlighter-rouge">t</code>, <code class="language-plaintext highlighter-rouge">i</code>, and <code class="language-plaintext highlighter-rouge">r</code> it’s quite impossible. It doesn’t help that the surrounding code looks like some form of alien code:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="mi">63205</span><span class="p">:</span> <span class="kd">function</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span> <span class="nx">e</span><span class="p">,</span> <span class="nx">n</span><span class="p">)</span> <span class="p">{</span>
    <span class="dl">"</span><span class="s2">use strict</span><span class="dl">"</span><span class="p">;</span>
    <span class="nx">n</span><span class="p">.</span><span class="nx">d</span><span class="p">(</span><span class="nx">e</span><span class="p">,</span> <span class="p">{</span>
        <span class="na">N</span><span class="p">:</span> <span class="kd">function</span><span class="p">()</span> <span class="p">{</span>
            <span class="k">return</span> <span class="nx">o</span>
        <span class="p">}</span>
    <span class="p">});</span>
    <span class="kd">var</span> <span class="nx">i</span> <span class="o">=</span> <span class="nx">n</span><span class="p">(</span><span class="mi">16738</span><span class="p">),</span>
        <span class="nx">r</span> <span class="o">=</span> <span class="nx">n</span><span class="p">(</span><span class="mi">92340</span><span class="p">),</span>
        <span class="nx">s</span> <span class="o">=</span> <span class="nx">n</span><span class="p">(</span><span class="mi">9346</span><span class="p">);</span>
    <span class="kd">class</span> <span class="nx">o</span> <span class="p">{</span>
        <span class="kd">static</span> <span class="nx">generateHeaders</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="p">{</span>
            <span class="p">...</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span></code></pre></figure>

<p>So let’s just use some common sense and go header-by-header:</p>

<h4 id="client-date">Client-Date</h4>
<p>This is the current time, converted to a string with Javascript’s <code class="language-plaintext highlighter-rouge">toISOString()</code> function.</p>

<h4 id="x-security">X-Security</h4>
<p>Here is the snippet again for convenience:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="dl">"</span><span class="s2">X-Security</span><span class="dl">"</span><span class="p">:</span> <span class="nx">s</span><span class="p">.</span><span class="nx">V</span><span class="p">.</span><span class="nx">hashStr</span><span class="p">(</span><span class="nx">i</span><span class="p">().</span><span class="nx">format</span><span class="p">(</span><span class="dl">"</span><span class="s2">YYYYMMDDHHmm</span><span class="dl">"</span><span class="p">)).</span><span class="nx">toString</span><span class="p">()</span></code></pre></figure>

<p>We can guess that it’s a hash of the current time, after being converted to the format <code class="language-plaintext highlighter-rouge">YYYYMMDDHHmm</code>. Which hash? The result is 16-byte long so the most probable candidate is md5. Let’s check:</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s1">'202202120858'</span> | <span class="nb">md5sum
</span>f627c44850a16146d60590eb9584bac3</code></pre></figure>

<p>Doesn’t match… maybe we need to use the local time instead?</p>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><span class="nv">$ </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="s1">'202202121058'</span> | <span class="nb">md5sum
</span>185111eb1d17ea0bf0928f2655d05254</code></pre></figure>

<p>It matches! So we got this header as well.</p>

<h4 id="x-client-traceid">X-Client-TraceId</h4>

<p>Here’s the relevant part again:</p>

<figure class="highlight"><pre><code class="language-javascript" data-lang="javascript"><span class="kd">const</span> <span class="nx">e</span> <span class="o">=</span> <span class="nx">i</span><span class="p">().</span><span class="nx">toISOString</span><span class="p">();</span>
<span class="kd">let</span> <span class="nx">n</span> <span class="o">=</span> <span class="nx">e</span> <span class="o">+</span> <span class="nx">t</span> <span class="o">+</span> <span class="nx">r</span><span class="p">.</span><span class="nx">N</span><span class="p">.</span><span class="nx">tracing</span><span class="p">.</span><span class="nx">salt</span><span class="p">;</span>
<span class="k">return</span> <span class="nx">n</span> <span class="o">=</span> <span class="nx">s</span><span class="p">.</span><span class="nx">V</span><span class="p">.</span><span class="nx">hashStr</span><span class="p">(</span><span class="nx">n</span><span class="p">).</span><span class="nx">toString</span><span class="p">(),</span> <span class="p">{</span>
    <span class="dl">"</span><span class="s2">X-Client-TraceId</span><span class="dl">"</span><span class="p">:</span> <span class="nx">n</span><span class="p">,</span>
    <span class="p">...</span>
<span class="p">}</span></code></pre></figure>

<p>Leveraging what we found out already, this header is generated as follows:</p>
<ul>
  <li>The current time, <code class="language-plaintext highlighter-rouge">e</code>, is concatenated to two unknown strings, <code class="language-plaintext highlighter-rouge">t</code> and <code class="language-plaintext highlighter-rouge">salt</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">X-Client-TraceId</code> is the md5 hash of the result.</li>
</ul>

<p>Now the fastest thing to do is to use a Javascript debugger to find out what <code class="language-plaintext highlighter-rouge">t</code> and <code class="language-plaintext highlighter-rouge">salt</code> are.
The Firefox debugger (Ctrl+Shift+Z) lets us beautify the script and put a breakpoint on this line. Hitting “Search” again the breakpoint is triggered, and we can see the variables’ values:</p>

<p><img src="/assets/img/stockexchange4.png" alt="Breakpoint" /></p>

<p>So apparently:</p>
<ul>
  <li><code class="language-plaintext highlighter-rouge">t</code> is the requested URL, including the query string.</li>
  <li><code class="language-plaintext highlighter-rouge">salt</code> is a fixed string, in this case <code class="language-plaintext highlighter-rouge">w4icATTGtnjAZMbkL3kJwxMfEAKDa3MN</code>. Apparently it appears in the source code as-is so it must be constant.</li>
  <li><code class="language-plaintext highlighter-rouge">X-Client-TraceId</code> is the md5 of <code class="language-plaintext highlighter-rouge">time + url + salt</code>.</li>
</ul>

<p>Now we have all the information needed to generate valid requests to the API:</p>
<ul>
  <li>Take the current time and hash it to generate <code class="language-plaintext highlighter-rouge">X-Security</code>.</li>
  <li>Construct the URL with the parameters, add it to the time and salt and hash everything together to generate <code class="language-plaintext highlighter-rouge">X-Client-TraceId</code>.</li>
</ul>

<p>And it works! Here is a Python snippet to generate the headers for a given URL:</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">datetime</span>
<span class="kn">import</span> <span class="nn">hashlib</span>

<span class="k">def</span> <span class="nf">generate_headers</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
    <span class="n">salt</span> <span class="o">=</span> <span class="s">"w4icATTGtnjAZMbkL3kJwxMfEAKDa3MN"</span>
    <span class="n">current_time</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="n">datetime</span><span class="p">.</span><span class="n">now</span><span class="p">(</span><span class="n">tz</span><span class="o">=</span><span class="n">datetime</span><span class="p">.</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">)</span>
    <span class="n">client_date</span> <span class="o">=</span> <span class="p">(</span><span class="n">current_time</span>
        <span class="p">.</span><span class="n">isoformat</span><span class="p">(</span><span class="n">timespec</span><span class="o">=</span><span class="s">"milliseconds"</span><span class="p">)</span>
        <span class="p">.</span><span class="n">replace</span><span class="p">(</span><span class="s">"+00:00"</span><span class="p">,</span> <span class="s">"Z"</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">client_traceid</span> <span class="o">=</span> <span class="n">hashlib</span><span class="p">.</span><span class="n">md5</span><span class="p">(</span>
        <span class="p">(</span><span class="n">client_date</span> <span class="o">+</span> <span class="n">url</span> <span class="o">+</span> <span class="n">salt</span><span class="p">).</span><span class="n">encode</span><span class="p">(</span><span class="s">"utf-8"</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">security</span> <span class="o">=</span> <span class="n">hashlib</span><span class="p">.</span><span class="n">md5</span><span class="p">(</span>
        <span class="n">current_time</span><span class="p">.</span><span class="n">strftime</span><span class="p">(</span><span class="s">"%Y%m%d%H%M"</span><span class="p">).</span><span class="n">encode</span><span class="p">(</span><span class="s">"utf-8"</span><span class="p">)</span>
    <span class="p">)</span>

    <span class="k">return</span> <span class="p">{</span>
        <span class="s">"Client-Date"</span><span class="p">:</span> <span class="n">client_date</span><span class="p">,</span>
        <span class="s">"X-Client-TraceId"</span><span class="p">:</span> <span class="n">client_traceid</span><span class="p">.</span><span class="n">hexdigest</span><span class="p">(),</span>
        <span class="s">"X-Security"</span><span class="p">:</span> <span class="n">security</span><span class="p">.</span><span class="n">hexdigest</span><span class="p">()</span>
    <span class="p">}</span></code></pre></figure>

<h3 id="concluding">Concluding</h3>
<p>What was the purpose of these headers? I’m really not sure. It could be protection against bots or maybe a user-tracking mechanism. Anyway, it didn’t take much work to understand it. I guess if you are exposing your API on the internet, expect someone to figure it out and use it.</p>]]></content><author><name></name></author><category term="reversing" /><summary type="html"><![CDATA[This was a fun afternoon reverse engineering project so I figured I’d write a bit about it.]]></summary></entry></feed>