Prevent Nexus from initializing until CRDB is populated#712
Conversation
davepacheco
left a comment
There was a problem hiding this comment.
(Not sure if you're still looking for a review here. I think we discussed offline having RSS not start Nexus until database populate was done. I think that's a lot cleaner.)
To record some thoughts that we talked about here:
- This is probably fine
- It sort of feels consistent with the general practice of having things come online in a degraded mode until their deps are available, rather than, say, aborting on startup and requiring human intervention.
- I think this case is different from that general case because: a dependency going offline is an anticipated condition that can always happen and so has to be handled at runtime. Having Nexus come up with an unpopulated database can be avoided altogether.
- Makes me a little uneasy: this change makes some assumptions load-bearing that were previously sort of accidental: that "dbinit.sql" will always set the "version" metadata and that that will always be the last thing that it does. It's not easy to enforce this.
- I think the property we want to check is "the schema matches something that we're capable of operating on". That seems like a much harder problem.
- But as I said above, these are vague fears and if this makes things a lot simpler, it's probably fine. But if we can instead guarantee that the database is populated before we start Nexus, that feels cleaner and more robust.
| value STRING(1023) NOT NULL | ||
| value STRING(1023) NOT NULL, | ||
|
|
||
| PRIMARY KEY(name, value) |
There was a problem hiding this comment.
If we keep this: I think we only want the "name" in the primary key.
I think this is potentially another check we should consider - I actually think Nexus validating that the tables-in-CRDB can be parsed might not be a bad idea to do explicitly. But this can be punted when we have more experience with upgrade / schema migrations.
Yeah, the workaround I'm using in the |
Update to Propolis commit 50cb28f5 to obtain the following updates: ``` 50cb28f server: separate statuses of inbound and outbound migrations (#661) 61e1481 Don't return None from BlockData::block_for_req when attachment is paused (#711) 79e243b Build and clippy fixes for Rust 1.79 (#712) 3c0f998 Modernize fw_cfg emulation 7a65be9 Update and prune dependencies ``` 50cb28f5 changes the Propolis instance status and migration status APIs to report both inbound and outbound migration status. Update sled agent accordingly. Tests: `cargo nextest`, smoke test on a dev cluster.
Stops Nexus from booting until CRDB appears initialized, at least once.
This change allows other bootstrapping aspects of the control plane to initialize Nexus and CRDB concurrently, as Nexus will await CRDB formatting before launching itself.