renewal: use lineage-specific server for ARI#10307
Conversation
ohemorange
left a comment
There was a problem hiding this comment.
The fix looks good, but would you mind adding a short integration test in certbot-ci/src/certbot_integration_tests/certbot_tests/test_main.py? Both as a regression test for this in particular and just generally. That can happen in a different PR if you'd prefer.
| * | ||
| * ACME Renewal Info (ARI) support. https://datatracker.ietf.org/doc/draft-ietf-acme-ari/ | ||
| `certbot renew` will automatically check ARI when using an ACME server that supports it, | ||
| and may renew early based on the ARI information. |
There was a problem hiding this comment.
This may be a good place to add a note about ari overriding renew_before_expiry values set in the config file if it comes first
There was a problem hiding this comment.
#10312 tracks making sure this entry is fully fleshed out, so leaving it up to you to merge now or add detail here
There was a problem hiding this comment.
Thanks for the reminder. Pushed an edit adding detail. Lemme know what you think.
Yes, this is a good idea. I'm not totally sure how to proceed though. How does this sound:
|
|
Oh, does pebble not implement ARI? If not, how come? I was definitely assuming that it does. I think the most correct thing would probably be to put it in there, but that's probably a larger ask than I realized. |
|
Pebble does support ARI, but it doesn't have a way to twiddle ARI to say "renew now." We could add that of course. But the main thing I was thinking with my example is that we wanted to specifically have an integration test for this bug. That is, there should be a renewal config where the server for that lineage is different than the globally set server. I think that requires having a separate ACME server. I guess we could set up a whole second Pebble but that seems slightly overkill vs starting up an HTTP server within the Python integration test. WDYT? |
|
Ok so first question, the ARI PR changed As for this bug...starting two pebbles would be closer to a realistic test, which is the point of integration tests, but yeah that might be too much overhead. But maybe not? But actually, does it matter? This bug happens when you try to renew, and the lineage config has one server saved in the renewal config but another set by cli/default server if none specified. So we could issue a cert against pebble, and then call renew without the flag specifically setting pebble as the server, and that should repro without the fix. |
Ugh, this was meant to be a local-only change that was useful when I was running I am also surprised it didn't break any cases. Looking at the code, the affected line is only used by
In the test Pebble instance, ARI works, but it returns a default renewal period that's in the future.
There are two symptoms of this bug:
I think the integration test should be checking (2), and for that we need to have one server that says "please renew now," and it should be different from the default server. |
Ah yes, I was only thinking about checking (1). Presumably that error would always happen when we're not checking the right server though, no? If we can confirm we're querying the right server by not seeing the error, what is the additional benefit to (2)? Were you thinking of a more general test to make sure we actually renew early based on ari interval? In that case I'd still prefer to be as close as possible to a real situation; could we configure pebble to make a really short lived cert maybe? Probably adding a config to pebble for immediate ari would be cleaner though. |
Yeah, as you say I was thinking it would be useful to have the more general test. I also think it would be a more robust test. My experience with checking for the presence of log lines is that it's prone to failing spuriously (e.g. due to slight rewordings or refactorings). Checking for the absence of log lines has the opposite problem: slight rewording could result in the test always passing, even if the actual functionality regresses. If we test for "does it actually renew early," we get a double benefit with one test case:
It's true that Pebble should probably have a hook to say "trigger ARI now." If we land that in Pebble, do you think the two-Pebbles approach would be the right one here? |
|
That's a good point about the rewording of log lines, especially in integration tests.
I still don't understand why we'd need two pebbles in that case. What we're testing isn't "specific server 1 (used to issue cert) vs specific server 2 being used on renew," it's "specific server 1 vs default certbot server being used on renew." A call to "certbot renew" could simply not specify a server, and then if it renews we'd be sure it's actually using the server from the renewal config file, and not trying to ping the default certbot server. |
|
The integration test framework already sets But I suppose for the |
|
It should be pretty easy to modify the integration framework to simply not set the server for a particular call, no? Probably by doing something like adding an optional parameter? Or would that require too many layers of plumbing that would become messy? We have simply not had need to do it in the past but it's our own integration framework that we have developed for convenience and is entirely internal so we don't have to worry about breaking changes or anything. |
This depends on a pending Pebble pull request and so will fail integration tests until/unless that lands: letsencrypt/pebble#501 However, I'd appreciate some eyes on this PR in this regard: is the interface we're using in Pebble useful and appropriate? If not, we can adjust the Pebble PR. Inspired based on conversation on #10307, but note that this just tests the general case; it does not test the "default server differs from lineage server" case yet; when I try adding that I get some bugs that may reflect a problem in #10307 I need to fix (or may reflect that I need to inhibit the `--server` flag rather than trying to override it late in the command line).
Previously, we were constructing an ACME client for ARI checking that used the global value for `server`, not the one recorded in a lineage's renewal file. This resulted in errors in the logs and failure to observe ARI for lineages that used a non-default `--server` (e.g. staging or non-Let's Encrypt CAs).
d1252ab to
0592e0d
Compare
|
I've rebased on I also updated that integration test for this specific case. Now, on renewal, it inhibits the To confirm, if a user calls |
|
By the way, worth noting that I took a shortcut by making |
| if force_renew: | ||
| additional_args.append('--renew-by-default') | ||
|
|
||
| if directory_url: |
There was a problem hiding this comment.
technically this is a behavior change since before we would have happily passed through "", but this is better imo
Co-authored-by: ohemorange <ebportnoy@gmail.com>
Previously, we were constructing an ACME client for ARI checking that used the global value for
server, not the one recorded in a lineage's renewal file.This resulted in errors in the logs and failure to observe ARI for lineages that used a non-default
--server(e.g. staging or non-Let's Encrypt CAs).