[reconfigurator] Move insertions into crucible_dataset from executor RPW to rendezvous RPW#7397
Conversation
|
I ran Obviously R12 does not have the new rendezvous tables. I then upgraded to this branch. Immediately after the upgrade, prior to running schema migration, A short time after schema migration, the Nexus instances came online, and we can see that it knows all the Crucible dataset records already exist: and it populated I then went through the standard "add sled" process. The first blueprint after adding the new sled sets up the sleds disks (including its debug datasets!) and NTP zone; after that blueprint was executed, we see 10 new debug datasets (with a different The second blueprint after adding the new sled sets up Crucible datasets and zones; after that blueprint was executed, we see the expected output from the RPW and 38 rows in I also confirmed I could create disks and instances, and that we see the new sled being used and |
5464d11 to
6602604
Compare
| // The dataset was already in the database. If it was in | ||
| // service, we should have noted that it already exists. | ||
| if prep.disposition == ArbitraryDisposition::InService { | ||
| expected_stats.num_already_exist += 1; |
There was a problem hiding this comment.
I think it's fine to have this code here, but are we ever actually triggering it via proptest? I thought this condition was only possible to hit with a concurrent Nexus
There was a problem hiding this comment.
We are triggering this, yes; commenting it out results in this failure:
thread 'crucible_dataset::tests::proptest_reconciliation' panicked at nexus/reconfigurator/rendezvous/src/crucible_dataset.rs:311:9:
Test failed: assertion `left == right` failed
left: CrucibleDatasetsRendezvousStats { num_inserted: 0, num_already_exist: 1, num_not_in_inventory: 0 }
right: CrucibleDatasetsRendezvousStats { num_inserted: 0, num_already_exist: 0, num_not_in_inventory: 0 }.
minimal failing input: prep = {
0: DatasetPrep {
disposition: InService,
in_inventory: false,
in_database: true,
},
}
However, you're right that the test doesn't trigger issues from concurrent Nexuses. There are two places where we increment num_already_exist; the common one is "some previous invocation already recorded this dataset", which this test exercises:
omicron/nexus/reconfigurator/rendezvous/src/crucible_dataset.rs
Lines 113 to 117 in 6602604
We don't exercise the "concurrent Nexus made us lose the race" one:
omicron/nexus/reconfigurator/rendezvous/src/crucible_dataset.rs
Lines 143 to 146 in 6602604
|
|
||
| for (id, prep) in prep { | ||
| let id: DatasetUuid = u32_to_id(id); | ||
|
|
There was a problem hiding this comment.
(style) do you think this is easier or harder to read? I found myself wandering through the nested conditionals a bit; wondering if this is easier to parse for humans.
let in_db_before = prep.in_database;
let in_service = prep.disposition == ArbitraryDisposition::InService;
let in_inventory = prep.in_inventory;
let in_db_after = datastore_datasets.contains_key(&id);
// Validate rendezvous output
match (in_db_before, in_service, in_inventory) {
(true, true, _) => expected_stats.num_already_exist += 1,
(false, true, false) => expected_stats.num_not_in_inventory += 1,
(false, true, true) => expected_stats.num_inserted += 1,
(_, false, _) => (),
}
// Validate database state
match (in_db_before, in_service, in_inventory) {
(true, _, _) => assert!(in_db_after, "Existing dataset missing"),
(false, false, _) => assert!(!in_db_after, "Expunged dataset should not get new records"),
(false, true, false) => assert!(!in_db_after, "Should wait for inventory before inserting records"),
(false, true, true) => assert!(in_db_after, "In-service datasets in inventory should be inserted"),
}There was a problem hiding this comment.
Yeah, that does seem clearer to me. Switched in a419aee
| // Return the result as a `serde_json::Value` | ||
| match result { | ||
| Ok(()) => json!({}), | ||
| Ok(stats) => json!({ |
There was a problem hiding this comment.
non-blocking suggestion: I'd add a struct in nexus/types/src/internal_api/background.rs that can be shared with omdb instead of free form json here.
|
|
||
| #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] | ||
| pub struct RendezvousStats { | ||
| pub crucible_dataset: CrucibleDatasetsRendezvousStats, |
There was a problem hiding this comment.
non-blocking question: no stats for debug datasets?
This PR builds on #7386. I'll test both of them together on a racklette (will put results in a comment on this PR).
This PR contributes to addressing (but doesn't fully address) a couple issues:
crucible_datasetrendezvous table, but we still need useful output fordebug_datasetI'm inclined to say this PR fully addresses a couple other issues, but it's debatable:
datasetas a general table is now gone and replaced with more specific rendezvous tables, so I think this one can be closed?