Change `dataset` table to only hold Crucible datasets by jgallagher · Pull Request #7386 · oxidecomputer/omicron

jgallagher · 2025-01-22T20:28:11Z

The diff of this PR is large but very mechanical. The actual changes here are:

The dataset table has been renamed crucible_dataset (and correspondingly, the Dataset model struct is now CrucibleDataset, and Datastore::dataset_* methods are now Datastore::crucible_dataset_*)
The zone_name column has been dropped (it was always NULL for Crucible datasets, enforced by a CHECK constraint)
The quota, reservation, and compression columns have been dropped (they were always NULL for Crucible datasets, although this was not enforced by a CHECK constraint; this information is kept in the blueprint and is not read from this table)
The ip, port, and size_used columns now have NOT NULL constraints (they already had NOT NULL constraints specifically for Crucible datasets via a CHECK constraint; correspondingly, the CrucibleDataset model type has non-optional fields for these columns)

and everything else is followup to account for this. Most of that followup is just renaming things. A few nontrivial exceptions:

There were a handful of places that had to deal with the optionality of address by failing if it wasn't set for Crucible datasets; that error handling is now gone since address is no longer optional.
Some tests specifically operated on the dataset table using a mix of Crucible and non-Crucible datasets. I tried to update these as reasonably as I could while keeping the spirit of the test in tact, but a closer look at these bits would be good.
The rack init request in the RSS -> Nexus handoff now only includes Crucible datasets.

I think this gets us most of the way toward crucible_dataset being a "rendezvous table" in the RFD 540 sense. As of this PR, the blueprint executor task is still responsible for adding entries to this table, but I'll move that to the rendezvous background task in a followup.

Fixes #7299. Gets us halfway to #7301 by fixing the dataset side of that issue (bp_omicron_dataset still needs work).

jmpesp · 2025-01-23T16:11:31Z

+
+/// Database representation of a Crucible dataset's live information.
+///
+/// This includes the socket address of the Crucible downstairs that owns this


s/downstairs/agent/

fixed in deeec80

jmpesp · 2025-01-23T16:19:53Z

 #[derive(Debug, Default, Serialize, Deserialize)]
 pub struct CrucibleResourcesV2 {
-    pub datasets_and_regions: Vec<(Dataset, Region)>,
+    pub datasets_and_regions: Vec<(CrucibleDataset, Region)>,


We'll need to be able to deserialize these existing blobs: will the JSON format change with the type change here? I don't think so but I've been wrong before!

It does change but only in ways that should be compatible, I think:

The old JSON blobs will have zone_name, quota, reservation, and compression fields that we'll ignore when deserializing.

The old JSON blobs could have had null for the ip, port, or size_used fields. If they did, we will now fail to deserialize them. But those should have always been set for Crucible datasets, so should not have null?

I was planning to take this through a "deploy R12 on a racklette, upgrade to this branch" test before merging. (Although I'd like to wait to do that until I have the next, smaller PR up, to test both at the same time.) What should I do to the racklette before upgrading to ensure I have some of these JSON blobs present?

Ooh, maybe a better check: where is this JSON in CRDB? I can check dogfood and see if we have any that would cause problems.

Hm well that didn't work. Assuming this is volume.resources_to_clean_up, dogfood only has empty resources for V1 and V2, and V3 looks like it only has IDs in it?

root@[fd00:1122:3344:105::3]:32221/omicron> select resources_to_clean_up from volume where resources_to_clean_up is not null and resources_to_clean_up like '{"V2%'; resources_to_clean_up --------------------------------------------------------------- {"V2":{"datasets_and_regions":[],"snapshots_to_delete":[]}} {"V2":{"datasets_and_regions":[],"snapshots_to_delete":[]}} (2 rows) Time: 12ms total (execution 12ms / network 0ms) root@[fd00:1122:3344:105::3]:32221/omicron> select resources_to_clean_up from volume where resources_to_clean_up is not null and resources_to_clean_up like '{"V1%'; resources_to_clean_up ------------------------------------------------------------------ {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} {"V1":{"datasets_and_regions":[],"datasets_and_snapshots":[]}} (13 rows)

Yeah, unfortunately in this case volume records are eventually hard deleted.

What should I do to the racklette before upgrading to ensure I have some of these JSON blobs present?

Actually, I think test_deserialize_old_crucible_resources covers this case!

jmpesp · 2025-01-23T16:23:52Z

+        let dataset = dataset?;
+        let id = dataset.id();
+
+        // If this dataset already exists, do nothing.


This looks like a behaviour change from what was here before?

I'll make the claim it is not, even though it looks like it is. 😅

Before, we had this:

// If this dataset already exists, only update it if it appears different from what exists // in the database already. let action = if let Some(db_dataset) = existing_datasets.remove(&id) { let db_config: DatasetConfig = db_dataset.try_into()?; let bp_config: DatasetConfig = bp_dataset.clone().try_into()?; if db_config == bp_config { num_unchanged += 1; continue; }

When flattening a db_dataset down to a DatasetConfig to see if there are any relevant changes, we were throwing away ip, port, and size_used, only leaving us with information derived from pool_id, zone_name, quota, reservation, and compression:

impl TryFrom<Dataset> for omicron_common::disk::DatasetConfig { type Error = Error; fn try_from(dataset: Dataset) -> Result<Self, Self::Error> { let compression = if let Some(c) = dataset.compression { c.parse().map_err(|e: anyhow::Error| { Error::internal_error(&e.to_string()) })? } else { omicron_common::disk::CompressionAlgorithm::Off }; Ok(Self { id: dataset.identity.id.into(), name: omicron_common::disk::DatasetName::new( omicron_common::zpool_name::ZpoolName::new_external( ZpoolUuid::from_untyped_uuid(dataset.pool_id), ), dataset.kind.try_into_api(dataset.zone_name)?, ), inner: omicron_common::disk::SharedDatasetConfig { quota: dataset.quota.map(|q| q.into()), reservation: dataset.reservation.map(|r| r.into()), compression, }, }) } }

We dropped all the those fields we were comparing except pool_id. So the equivalent code here would be something like:

if existing_dataset.pool_id == dataset.pool_id { num_unchanged += 1; continue; }

but I don't believe it's possible for a dataset's pool ID to change, right? So I dropped what would look like a weird / spurious check and replaced it with just the "have we inserted this already" check that's here now.

If it is possible for a dataset's pool ID to change, the new code is a behavior change and is wrong.

A separate question is: is it possible for a dataset's IP or port to change? If so, this code is wrong, but so was the old code.

We dropped all the those fields we were comparing except pool_id.

I don't follow this part.

I don't believe it's possible for a dataset's pool ID to change, right?

I believe the same thing haha

We dropped all the those fields we were comparing except pool_id.

I don't follow this part.

The old code for deciding whether to try to upsert was this:

// If this dataset already exists, only update it if it appears different from what exists // in the database already. let action = if let Some(db_dataset) = existing_datasets.remove(&id) { let db_config: DatasetConfig = db_dataset.try_into()?; let bp_config: DatasetConfig = bp_dataset.clone().try_into()?; if db_config == bp_config { num_unchanged += 1; continue; }

Those DatasetConfig structs are only comparing a subset of fields: pool_id, zone_name, quota, reservation, and compression. Of all those fields, the new CrucibleDataset only has pool_id. So the behaviorally-equivalent code would be to just compare "does the blueprint dataset pool_id match the database dataset pool_id", but that's a silly check to make when we know it can't change.

jmpesp · 2025-01-23T16:25:46Z

+                if !matches!(zone.zone_type, BlueprintZoneType::Crucible(_)) {
+                    return None;
+                }
+                let dataset = zone


is the type here CrucibleDataset?

No, it's DurableDataset, which contains a reference to an OmicronZoneDataset:

omicron/nexus/types/src/deployment/zone_type.rs

Lines 237 to 241 in db656d7

pub struct DurableDataset<'a> {

pub dataset: &'a OmicronZoneDataset,

pub kind: DatasetKind,

pub address: SocketAddrV6,

}

jmpesp · 2025-01-23T16:27:31Z

+        // datasets.
        //
        // It should only insert these new zones.
        let new_zones = [


This seems wrong, though I haven't looked deeply into this test: it's inserting two of the same zone that are different types than before this diff.

I think this is okay, but please double check me. Before, the execution code would insert dataset records for any DatasetKind since dataset held all dataset kinds; now, it only inserts records for Crucible datasets. The test is trying to ensure that new datasets added after the last execution are added, which is now restricted to "new Crucible datasets added after the last execution are added", so the new zones we use to test have to be Crucible zones.

Before, the execution code would insert dataset records for any DatasetKind since dataset held all dataset kinds; now, it only inserts records for Crucible datasets.

I think this is why I was suspicious: based on that understanding this vec should have had four members, mixing together the crucible and non-crucible datasets.

There should be two versions of this test: one for "durable" datasets and one for crucible datasets.

I might be missing something - does it help if all_datasets already included Crucible datasets? (Guaranteed by https://github.com/oxidecomputer/omicron/pull/7386/files/26ccf5f2e0f22721cd4d1d26bac4d9c4dc659e6b#diff-d4165f273b361528feca0c934574df6365cd86f5688ab81665f35975620183a1R312-R316, and then we pass in all_datasets.iter().chain(new_zones); that used to provide a mix of crucible and non-crucible, and now just passes in "more crucible".)

We chatted about what made this test confusing; I took a stab at addressing it in 1201f9a

jmpesp · 2025-01-23T16:32:16Z

-    kind omicron.public.dataset_kind NOT NULL,
+    /*
+     * Contact information for the dataset: socket address of the Crucible
+     * downstairs service that owns this dataset


s/downstairs/agent/

fixed in deeec80

smklein · 2025-01-23T18:06:01Z

+set local disallow_full_table_scans = off;
+
+DELETE FROM omicron.public.crucible_dataset WHERE kind != 'crucible';


I was going to ask if we wanted to add a constraint that the kind must be 'crucible' before dropping, to ensure we've cleared other dataset types, but I hesitated to recommend that because "blocking the update" also seems bad.

To confirm, if there happen to be non-crucible entries for any reason, we're okay dropping them quietly?

There definitely are non-crucible entries! But dropping them is intentional: we're converting this into "the Crucible dataset rendezvous table", and after #7341 and #7350, there are no consumers of this table left that query for anything other than kind = 'crucible'.

smklein · 2025-01-23T18:08:34Z

-      (kind != 'zone') OR
-      (kind = 'zone' AND zone_name IS NOT NULL)
-    )
+    size_used INT NOT NULL


Not your fault that this isn't labelled, but could we add: "this field is controlled by Nexus, and is updated as region allocations are performed from this dataset"

Added in deeec80

smklein · 2025-01-23T18:08:54Z


 /*
- * A dataset of allocated space within a zpool.
+ * Table tracking the contact information and size used by datasets associated


Nit: Is it worth documenting that this is a rendezvous table?

Sure - added in deeec80

smklein · 2025-01-23T18:14:56Z

+        ip -> Inet,
+        port -> Int4,


nice, thanks for making these non-nullable, makes this a lot easier to use

smklein · 2025-01-23T18:19:34Z

    }

-    pub async fn dataset_upsert_if_blueprint_is_current_target(
+    pub async fn crucible_dataset_upsert_if_blueprint_is_current_target(


(no action needed) since this is invoked by the blueprint executor to ensure dataset records exist, I assume this can also go away when we move more fully to the rendezvous table model.

(all to say, thanks for this PR, this is a great direction)

Yep, that's the plan for the next PR 👍

smklein · 2025-01-23T21:23:29Z

-            }
-
-            datastore
-                .dataset_delete_if_blueprint_is_current_target(


Rad to see this unified, it felt gross as it was written (deletion of a table row happening in multiple possible spots). Thanks again for this change.

jgallagher · 2025-01-24T19:56:39Z

Notes from testing this PR along with its successor #7397 on london are at #7397 (comment).

…r RPW to rendezvous RPW (#7397) This PR builds on #7386. I'll test both of them together on a racklette (will put results in a comment on this PR). This PR contributes to addressing (but doesn't fully address) a couple issues: * #7392: This PR adds useful output for the `crucible_dataset` rendezvous table, but we still need useful output for `debug_dataset` * #6999: This PR removes a step from execution that currently depends on successful completion of earlier step to post datasets to sled-agents, since we now confirm insertions are okay via inventory collection I'm inclined to say this PR fully addresses a couple other issues, but it's debatable: * #7210: Are we okay with IP and port still being here now that this table is just crucible datasets? (IMO it's still conceptually weird to say a dataset has an IP/port, but maybe it's not as bad now?) * #6998 `dataset` as a general table is now gone and replaced with more specific rendezvous tables, so I think this one can be closed?

jgallagher added 8 commits January 22, 2025 11:59

db-model: dataset -> crucible_dataset

2f8c748

db-queries: dataset -> crucible_dataset

148c744

openapi regenerate and fixups

f9a2325

schema migration

9274f3a

expectorate

8471012

update executor and tests

f064527

rename datastore dataset_* methods -> crucible_dataset_*

9d7979e

fix schema migration idempotency

26ccf5f

jgallagher requested review from jmpesp and smklein January 22, 2025 20:28

jmpesp requested changes Jan 23, 2025

View reviewed changes

smklein approved these changes Jan 23, 2025

View reviewed changes

jgallagher added 4 commits January 24, 2025 09:23

Merge branch 'main' into john/dataset-is-only-crucible

5ed1e19

fixes for changes on main

260d10a

comment updates from PR feedback

deeec80

test cleanup

1201f9a

jgallagher requested a review from jmpesp January 24, 2025 14:47

jgallagher mentioned this pull request Jan 24, 2025

[reconfigurator] Move insertions into crucible_dataset from executor RPW to rendezvous RPW #7397

Merged

jmpesp approved these changes Jan 27, 2025

View reviewed changes

jgallagher merged commit 625ad84 into main Jan 27, 2025

jgallagher deleted the john/dataset-is-only-crucible branch January 27, 2025 18:02

	pub struct DurableDataset<'a> {
	pub dataset: &'a OmicronZoneDataset,
	pub kind: DatasetKind,
	pub address: SocketAddrV6,
	}

		set local disallow_full_table_scans = off;

		DELETE FROM omicron.public.crucible_dataset WHERE kind != 'crucible';

Conversation

jgallagher commented Jan 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher commented Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants