Matthew McPherrin

CT: Managing uptime as a CA

2025-10-20T00:00:00+00:00

This talk was given at the transparency.dev summit 2025.

As a Certificate Authority, Let’s Encrypt needs to submit pre-certificates to CT logs to get SCTs for embedding in certificates. This adds an external dependency on our issuance process, which was historically one of the big concerns CAs have had about CT. In this talk, we’ll discuss how we’ve managed that availability risk through the different submission algorithms we’ve used, and what the real world impact CT has had on our certificate issuance process. Looking forward, our new Static CT logs provide different tradeoffs on latency and reliability, which will make the Let’s Encrypt CA more reliable overall, and how operating CT logs helps us keep our CA running.

A talk recording is on Youtube.

Ten Years as a Free, Open, and Automated Certificate Authority

2025-03-25T00:00:00+00:00

This talk was given at SRECon25 Americas.

Ubiquitous HTTPS is an essential part of a secure and privacy-respecting Internet. To that end, the public benefit certificate authority Let’s Encrypt has been issuing TLS certificates free of cost in a reliable, automated, and trustworthy manner for ten years. In that time, we’ve grown to servicing over 500,000,000 websites.

In this talk we’ll dive into the history of Let’s Encrypt and share helpful context for those managing TLS certificates, as well as information about upcoming changes to Let’s Encrypt and guidance for the future. We’ll also cover how we have strived to make the working lives of SREs around the world easier, and how the SRE community has helped us in return.

Slides and recording are linked on https://www.usenix.org/conference/srecon25americas/presentation/mcpherrin

Shining a new light on Certificate Transparency

2024-10-10T00:00:00+00:00

This talk was given at the transparency.dev summit 2024.

Let’s Encrypt operates one of the largest Certificate Authorities, along with Certificate Transparency logs. This talk discusses our experiences both as a submitter to CT as well as a log operator. We are excited about the new Sunlight log implementation, and will cover why we think Sunlight will be a good fit for us and for the larger CT ecosystem.

Slides and recording are linked on https://transparency.dev/summit2024/letsencrypt.html

Web PKI Revocation is Broken (but we can fix it!)

2023-10-21T00:00:00+00:00

The web public key infrastructure is used to secure HTTPS connections between browsers and websites using certificates. Today, when something goes wrong, browsers can’t reliably find out those certificates have been revoked. We examine past and future solutions to this problem, and how we can make progress on fixing revocation.

This talk covers the history of the protocols used to communicate web PKI revocation, including CRLs and OCSP, discusses why they have been ineffective. Then goes over future improvements that are ongoing now to make certificate lifetimes shorter, revive CRLs with new distribution and compression mechanisms. It discusses why new solutions might work where previous attempts like OCSP stapling failed.

The talk was given at the Cryptography and Privacy Village at DEFCON31, and then at BSides Toronto 2023.

The BSides recording is available on youtube. and embedded below.

The S in HTTPS

2023-09-29T00:00:00+00:00

I gave a talk for the Infrastructure Club, introducing how we communicate securely online.

You can find the talk on the Infrastructure Club YouTube Channel and embedded below.

The slide deck is linked here and embedded below.

Operationalizing SPIRE at Square

2020-05-04T00:00:00+00:00

At the Spring 2020 SPIFFE community day, I gave a talk about my work to take our SPIRE deployment from a prototype to production software we relied on.

You can watch the talk on youtube and embedded below.

As well, I previously talked at a 2019 community day.

That talk is also on youtube

Colliding the sum checksum

2019-11-23T00:00:00+00:00

The sum command-line tool is a simple checksum utility included in BSD and GNU Coreutils. It is not a cryptographically secure hash, and so I wrote a tool to set the sum of a file to an arbitrary value.

You can find that tool and associated code from this post on my github account in the sumcoll repository

Usage

Sum takes file(s) on the command line and prints a checksum value in decimal, along with a count of 1024-byte blocks.

$ echo abcdef > file
$ sum file
33901     1

Backstory

Some years ago I was asked to include the BSD sums of a homework assignment, to verify I had not modified the code after the assignment deadline. The sums of the files were included with a written portion submitted on paper in class, and the code was graded later.

Security

Using a non-cryptographically secure hash for this task is totally insecure, as a malicious student could find another file which has the same checksum as what was submitted on the paper handout. The property needed to prevent that is called Second Preimage Resistance. The malicious student gets to pick m1 before the assignment deadline, and then afterwards, if they can find m2 such that hash(m1) = hash(m2), they can submit m2, a second preimage, without detection. Whether m2 will get them more marks is left as an exercise to the student.

Even with no knowledge of the algorithm used, we can tell it’s not secure. We can observe the output of the sum utility is a 16-bit number (printed in decimal usually), along with a count of how many 1024 byte blocks are in the input. The output being only 16 bits is simply not large enough, and we can use brute force to find collisions: If we sum 2^16 + 1 files, we are guaranteed to find a collision. That’s not quite the 2nd preimage we want, but it’s illustrative of how a small checksum value cannot be cryptographically secure.

The Algorithm

There’s no need blindly brute-force, as we can examine how sum works and be a bit more intelligent about how we attack it. The algorithm is very simple: Each byte is added into a 16 bit counter, and the counter is rotated right between each addition. Or, in Python:

def rotate_right_16bit(data: int) -> int:
  return (data >> 1) | ((data & 1) << 15)
for byte in data:
    sum = rotate_right_16bit(sum) + byte
    sum = sum & 0xffff # clamp to 16 bits

Going Backwards

From looking at this algorithm, the first thing to notice is that all the operations are reversible: We can subtract and rotate left to go backwards. That lets us add or change characters at any point in the file, and easily work backwards to find out what the intermediate value of the checksum we need to get our final goal value.

def rotate_left_16bit(data: int) -> int:
  return (0xffff & (data << 1)) | (data >> 15)
for byte in reversed(suffix):
  sum = rotate_left_16bit((sum - byte) & 0xffff)

Making Changes

The tool we’re writing is going to insert bytes at a particular offset into the file. We’ll put the extra inserted characters in a comment, string, or some location that can be modified. We’ll compute the sum over a prefix up to the insertion point, and use the backwards sum computation to approach that same point from the opposite direction.

Our task now is to find a set of characters that will, when inserted after the prefix, cause the sum to that point to equal the backwards sum. This is likely brute-forceable, but because we can go backwards, we can do a bit better.

Meet In The Middle

The strategy we’ll take for our tool is going to take advantage of the fact that the set of possible hashes is small, and that we can go backwards.

We’ll alternate extending prefix and suffix strings by an extra character and then checking if the resulting hash is in the opposite set. That is, we add to the end of the prefix and check if we’ve hit any of the suffix hashes, and then prepend to the suffix and check if we’ve hit any of the prefix hashes.

Because of the birthday paradox, and the fact that the set of hashes is only 2^16, the attack takes only a split second on my computer to run.

Character Sets

The code for generating collisions takes a bytstring as input, and will only use bytes from that. This means you can, say, only feed it printable characters, only ascii lowercase characters, or whatever you need to not break the file you’re trying to collide.

Typically that means you want printable characters without newline for many comment “to the end of the line” style comments, or you can avoid / if you are using something like CSS that only does /* */ style comments.

Block Size

Inserting a few bytes could change the number of blocks in the input. You’ll have to figure out what to do about that yourself. Try inserting the new bytes at a different location, or removing some other bytes first.

Conclusion

This isn’t exactly a hardened target, so there’s nothing really novel here, but I haven’t found any documentation of how to accomplish this elsewhere.

I doubt anyone will ever need this, but there is both an implementation of the sum tool and the attack tool here. If you have feedback on that tool or this blog post, feel free to post issues there, or to email me.

Blog

2019-10-23T00:00:00+00:00

I have decided to set up a blog to organize and share things I do, just like everyone else’s blogs. I’ve made several decisions in how I’d like my blog to work.

The first is that I want to store the website’s content in source control. As a software developer I’m very comfortable with the edit, run, CI and CD flow. I can write in the same editor I always use. I can make local changes offline, test pages, and deploy when ready. Or I can just open github.com and use the in-browser editor. Git is the tool of choice for me, hosted on github.com.

A statically generated website is easy to host anywhere. I’m using Github and Github Pages because it’s convenient and free, but I could easily switch to AWS CodePipeline publishing to S3, or a docker container with Nginx serving the static content.

Thus this setup is not tied to any particular vendor. In a few minutes I could completely abandon Github should the need ever arise. That’s important to me, as I want to be in control over my own content. Leaving a hosted blogging solution would potentially be much more complicated.

So the next question is what static site generator to use. Github pages supports Jekyll, so that makes it a great candiate. I’ve been testing it out and it seems to work well enough. Markdown is easy to write for posts, and the liquid templating system is straightforward. I’m not using too many advanced features, so switching shouldn’t be too hard if I don’t like Jekyll for some reason in the future.

Finally, I have decided to make my own design rather than using a template. I was able to make a basic blog template with no major issues in an evening, which is another reason that Jekyll is working well. I made a base page layout, an index page, and a post one. It’ll probably be pretty ugly, but it’ll be mine.