-
-
Notifications
You must be signed in to change notification settings - Fork 631
wfe/ra: Periodically load rate limit overrides from the database #8407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… into NewTransactionBuilder
|
@jprenken, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
|
@jprenken, this PR adds one or more new feature flags: OverridesFromDB. As such, this PR must be accompanied by a review of the Let's Encrypt CP/CPS to ensure that our behavior both before and after this flag is flipped is compliant with that document. Please conduct such a review, then add your findings to the PR description in a paragraph beginning with "CPS Compliance Review:". |
jsha
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some initial style/organizational comments, but I also have a high-level concern. We've talked before about how blocking the process on a database query might become an issue, for instance if we are restarting a datacenter and the RA comes up before the database. It would be nice to not have an ordering dependency on startup; everything can just come up whenever and eventually become healthy. I see there's some code to backoff and retry the SA queries, but I suspect it will max out pretty quickly.
But there's also an issue of how long it takes even in normal operation to load the rate limit overrides. We've generally assumed these are going to be pretty fast, because there's not that many of them. But noticing that the RA is streaming results from the SA made me think again about how the list of overrides is only going to grow forever. Also, the growth rate is somewhat out of our control if we auto approve some requests. So we could wind up where an RA takes several seconds or potentially even minutes to start up, because it's loading overrides.
As a related issue: NewTransactionBuilder() does an SA RPC internally. In general I tend to assume constructor-like things run pretty fast and don't block on external services. How about this?
NewTransactionBuilder()returns quickly, with aTransactionBuilderthat has no overrides loaded.TransactionBuilder.NewRefresherattempts its first DB load immediately.- All DB loads have some amount of backoff and retry on error.
TransactionBuildergets a new method.Ready()that returns true if it has successfully loaded overrides once.- The RA's health check return true if:
TransactionBuilder.Ready()returns true, OR- N seconds have passed since startup.
That last point allows us to avoid serving without overrides data on most reloads, but also ensure that a failure to load the overrides doesn't completely prevent the RA from serving. And since we'll still have the refresher running, the RA gets additional chances to load overrides if the first one fails.
aarongable
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code and logic generally look great. A few high-level structural nits, but nothing that should be hard to address.
beautifulentropy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work on this, I'm very happy with how it turned out!
| // Health implements our grpc.checker interface. This method will be called | ||
| // periodically to set the gRPC service's healthpb.Health.Check() status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just something that crossed my mind when reading this, no need to address this here: #8471
Add a
refreshOverridesfunc to theratelimits.limitRegistrystruct. Instead of populating the staticoverridesfield once when creating an instance of the struct, call the new func at startup and then every 30 minutes.Emit relevant logs and metrics from
limitRegistry.Add an
OverridesFromDBlimiter config flag (for RA & WFE) to read overrides from the DB instead of a file.Flatten
newLimitRegistry.*()methods' logic into their sole caller,NewTransactionBuilder().Rename
loadDefaults()&loadOverrides(), appendingFromFilefor clarity/consistency.test: Add ra-sct-provider dependency on SA.
Important for deployment: If the
OverridesFromDBconfig flag is enabled, an RA now depends on the SA in order to load overrides. The RA must be added as a gRPC client ofsa.StorageAuthorityReadOnly.CPS Compliance Review:
OverridesFromDBonly controls how we load rate limit overrides, which has no compliance implications beyond general API availability (e.g. for revocation). I've checked our CP/CPS to confirm we make no related stipulations.Fixes #8382