Skip to content

renewal: use lineage-specific server for ARI#10307

Merged
ohemorange merged 8 commits into
certbot:mainfrom
jsha:check-ari-against-lineage-server
Jun 9, 2025
Merged

renewal: use lineage-specific server for ARI#10307
ohemorange merged 8 commits into
certbot:mainfrom
jsha:check-ari-against-lineage-server

Conversation

@jsha

@jsha jsha commented Jun 2, 2025

Copy link
Copy Markdown
Contributor

Previously, we were constructing an ACME client for ARI checking that used the global value for server, not the one recorded in a lineage's renewal file.

This resulted in errors in the logs and failure to observe ARI for lineages that used a non-default --server (e.g. staging or non-Let's Encrypt CAs).

@ohemorange ohemorange left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good, but would you mind adding a short integration test in certbot-ci/src/certbot_integration_tests/certbot_tests/test_main.py? Both as a regression test for this in particular and just generally. That can happen in a different PR if you'd prefer.

Comment thread certbot/CHANGELOG.md Outdated
*
* ACME Renewal Info (ARI) support. https://datatracker.ietf.org/doc/draft-ietf-acme-ari/
`certbot renew` will automatically check ARI when using an ACME server that supports it,
and may renew early based on the ARI information.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a good place to add a note about ari overriding renew_before_expiry values set in the config file if it comes first

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#10312 tracks making sure this entry is fully fleshed out, so leaving it up to you to merge now or add detail here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. Pushed an edit adding detail. Lemme know what you think.

@jsha

jsha commented Jun 3, 2025

Copy link
Copy Markdown
Contributor Author

The fix looks good, but would you mind adding a short integration test

Yes, this is a good idea. I'm not totally sure how to proceed though. How does this sound:

  • Issue a certificate against local Pebble.
  • Set up a local web server that offers a directory object with an ARI path, and handles the ARI path by always returning values meaning "renew now".
  • Edit the renewal config for that cert to change the server value.
  • Run certbot renew.
  • Expect the certificate got renewed.

@ohemorange

Copy link
Copy Markdown
Contributor

Oh, does pebble not implement ARI? If not, how come? I was definitely assuming that it does. I think the most correct thing would probably be to put it in there, but that's probably a larger ask than I realized.

@jsha

jsha commented Jun 3, 2025

Copy link
Copy Markdown
Contributor Author

Pebble does support ARI, but it doesn't have a way to twiddle ARI to say "renew now." We could add that of course.

But the main thing I was thinking with my example is that we wanted to specifically have an integration test for this bug. That is, there should be a renewal config where the server for that lineage is different than the globally set server. I think that requires having a separate ACME server. I guess we could set up a whole second Pebble but that seems slightly overkill vs starting up an HTTP server within the Python integration test. WDYT?

@ohemorange

Copy link
Copy Markdown
Contributor

Ok so first question, the ARI PR changed certbot-ci.utils.certbot_call from using force-renew to not in main, what was the effect of that? To make sure we're hitting the ari code? But then presumably we were doing some testing where we did in fact want it to renew, is that not being tested now? Or is ari turned off in the test pebble instance?

As for this bug...starting two pebbles would be closer to a realistic test, which is the point of integration tests, but yeah that might be too much overhead. But maybe not? But actually, does it matter? This bug happens when you try to renew, and the lineage config has one server saved in the renewal config but another set by cli/default server if none specified. So we could issue a cert against pebble, and then call renew without the flag specifically setting pebble as the server, and that should repro without the fix.

@jsha

jsha commented Jun 3, 2025

Copy link
Copy Markdown
Contributor Author

the ARI PR changed certbot-ci.utils.certbot_call from using force-renew to not in main, what was the effect of that?

Ugh, this was meant to be a local-only change that was useful when I was running certbot_test to manually test the new functionality. I aimed to revert it before PR review, but it looks like I accidentally put it back in after a subsequent round of testing and rebasing (and I only just now noticed your comment about it).

I am also surprised it didn't break any cases. Looking at the code, the affected line is only used by main, which is what gets called if you run certbot_test from the commandline. The actual test cases call internally via certbot_test() and pass force_renew=True or =False based on what they want.

is ari turned off in the test pebble instance?

In the test Pebble instance, ARI works, but it returns a default renewal period that's in the future.

we could issue a cert against pebble, and then call renew without the flag specifically setting pebble as the server, and that should repro without the fix.

There are two symptoms of this bug:

  1. We see an error logged for the failed ARI request (because it went to the wrong server, and that server doesn't know about the cert / issuer).
  2. Early renewal based on ARI doesn't happen (because we're not actually querying the right server).

I think the integration test should be checking (2), and for that we need to have one server that says "please renew now," and it should be different from the default server.

@ohemorange

Copy link
Copy Markdown
Contributor

There are two symptoms of this bug:

1. We see an error logged for the failed ARI request (because it went to the wrong server, and that server doesn't know about the cert / issuer).
2. Early renewal based on ARI doesn't happen (because we're not actually querying the right server).

I think the integration test should be checking (2), and for that we need to have one server that says "please renew now," and it should be different from the default server.

Ah yes, I was only thinking about checking (1). Presumably that error would always happen when we're not checking the right server though, no? If we can confirm we're querying the right server by not seeing the error, what is the additional benefit to (2)? Were you thinking of a more general test to make sure we actually renew early based on ari interval? In that case I'd still prefer to be as close as possible to a real situation; could we configure pebble to make a really short lived cert maybe? Probably adding a config to pebble for immediate ari would be cleaner though.

@jsha

jsha commented Jun 3, 2025

Copy link
Copy Markdown
Contributor Author

Ah yes, I was only thinking about checking (1). Presumably that error would always happen when we're not checking the right server though, no? If we can confirm we're querying the right server by not seeing the error, what is the additional benefit to (2)?
Were you thinking of a more general test to make sure we actually renew early based on ari interval?

Yeah, as you say I was thinking it would be useful to have the more general test. I also think it would be a more robust test. My experience with checking for the presence of log lines is that it's prone to failing spuriously (e.g. due to slight rewordings or refactorings). Checking for the absence of log lines has the opposite problem: slight rewording could result in the test always passing, even if the actual functionality regresses.

If we test for "does it actually renew early," we get a double benefit with one test case:

  • integration test for ARI checking in general (to complement the unittest)
  • integration test for the specific case of lineage server that differs from default server.

It's true that Pebble should probably have a hook to say "trigger ARI now." If we land that in Pebble, do you think the two-Pebbles approach would be the right one here?

@ohemorange

Copy link
Copy Markdown
Contributor

That's a good point about the rewording of log lines, especially in integration tests.

It's true that Pebble should probably have a hook to say "trigger ARI now." If we land that in Pebble, do you think the two-Pebbles approach would be the right one here?

I still don't understand why we'd need two pebbles in that case. What we're testing isn't "specific server 1 (used to issue cert) vs specific server 2 being used on renew," it's "specific server 1 vs default certbot server being used on renew." A call to "certbot renew" could simply not specify a server, and then if it renews we'd be sure it's actually using the server from the renewal config file, and not trying to ping the default certbot server.

@jsha

jsha commented Jun 4, 2025

Copy link
Copy Markdown
Contributor Author

The integration test framework already sets --server <some pebble URL> for each call, so both the initial issuance (certbot --server <some pebble> --no-verify-ssl --http-01-port .... certonly -d example123.com) and the renewal (certbot --server <some pebble> --no-verify-ssl --http-01-port .... renew) would have the same server.

But I suppose for the renew command I can have a call to IntegrationTestsContext.certbot(['renew', '--server', '<nonsense URL>']). The --server will be appended after the initial --server created by test setup, so will take precedence as the global value for the --server flag. And then the lineage config would have the real, functional <some pebble> URL, and we would only see the ARI response telling us to renew early if we chose the real, functional <some pebble>. I think that makes sense and can proceed with implementing.

@ohemorange

Copy link
Copy Markdown
Contributor

It should be pretty easy to modify the integration framework to simply not set the server for a particular call, no? Probably by doing something like adding an optional parameter? Or would that require too many layers of plumbing that would become messy? We have simply not had need to do it in the past but it's our own integration framework that we have developed for convenience and is entirely internal so we don't have to worry about breaking changes or anything.

@bmw bmw added this to the Certbot 4.1.0 milestone Jun 4, 2025
ohemorange pushed a commit that referenced this pull request Jun 6, 2025
This depends on a pending Pebble pull request and so will fail
integration tests until/unless that lands:
letsencrypt/pebble#501

However, I'd appreciate some eyes on this PR in this regard: is the
interface we're using in Pebble useful and appropriate? If not, we can
adjust the Pebble PR.

Inspired based on conversation on
#10307, but note that this just
tests the general case; it does not test the "default server differs
from lineage server" case yet; when I try adding that I get some bugs
that may reflect a problem in #10307 I need to fix (or may reflect that
I need to inhibit the `--server` flag rather than trying to override it
late in the command line).
jsha added 3 commits June 6, 2025 14:54
Previously, we were constructing an ACME client for ARI checking that used
the global value for `server`, not the one recorded in a lineage's renewal file.

This resulted in errors in the logs and failure to observe ARI for lineages that
used a non-default `--server` (e.g. staging or non-Let's Encrypt CAs).
@jsha jsha force-pushed the check-ari-against-lineage-server branch from d1252ab to 0592e0d Compare June 6, 2025 21:56
@jsha

jsha commented Jun 6, 2025

Copy link
Copy Markdown
Contributor Author

I've rebased on main to pull in the new integration test from #10311.

I also updated that integration test for this specific case. Now, on renewal, it inhibits the --server flag. That way we can tell the difference between checking ARI against the lineage server (which will work, and say "renew now") vs checking it against the built-in default (which would fail).

To confirm, if a user calls certbot renew --server http://example.com/, any certs that need renewal will use http://example.com/. Should we still check ARI against the old lineage server if it's different?

@jsha

jsha commented Jun 6, 2025

Copy link
Copy Markdown
Contributor Author

By the way, worth noting that I took a shortcut by making "" a special value for omitting the directory URL; probably all the layers that take directory_url should be an Optional instead. I'll work on that now.

Comment thread certbot-ci/src/certbot_integration_tests/certbot_tests/test_main.py Outdated
Comment thread certbot-ci/src/certbot_integration_tests/utils/certbot_call.py Outdated
ohemorange
ohemorange previously approved these changes Jun 6, 2025
if force_renew:
additional_args.append('--renew-by-default')

if directory_url:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this is a behavior change since before we would have happily passed through "", but this is better imo

Comment thread certbot/CHANGELOG.md Outdated
Comment thread certbot/CHANGELOG.md
Co-authored-by: ohemorange <ebportnoy@gmail.com>
@ohemorange ohemorange merged commit 1d9fc8d into certbot:main Jun 9, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants