Skip to content

Add database models and queries for External Subnets#9679

Merged
bnaecker merged 12 commits into
mainfrom
external-subnet-database-crud
Jan 23, 2026
Merged

Add database models and queries for External Subnets#9679
bnaecker merged 12 commits into
mainfrom
external-subnet-database-crud

Conversation

@bnaecker

Copy link
Copy Markdown
Collaborator
  • Adds model types for Subnet Pools, Subnet Pool Members, links between Subnet Pools and Silos, and External Subnets.
  • Add queries for CRUD on all four, which enforce our invariants like no overlapping IP members or subnets, and do allocation for External Subnets from an explicitly-requested subnet, or automatically from a pool. Pools can be specified either by name / ID or an optional IP version. In the later cases, the Subnet Pool is chosen from the pools linked to the silo.
  • Adds some scaffolding useful for coming integration with the public API, such as authz types and checks and create-time parameter types.
  • Closes Database models and queries for external subnets #9578

bnaecker and others added 3 commits January 18, 2026 13:20
- Adds model types for Subnet Pools, Subnet Pool Members, links between
  Subnet Pools and Silos, and External Subnets.
- Add queries for CRUD on all four, which enforce our invariants like no
  overlapping IP members or subnets, and do allocation for External
  Subnets from an explicitly-requested subnet, or automatically from a
  pool. Pools can be specified either by name / ID or an optional IP
  version. In the later cases, the Subnet Pool is chosen from the pools
  linked to the silo.
- Adds some scaffolding useful for coming integration with the public
  API, such as authz types and checks and create-time parameter types.
- Closes #9578
@bnaecker

Copy link
Copy Markdown
Collaborator Author

Excepting a CI flake, this is ready for review. The bad news is this PR is pretty big. The good is that it's mostly tests. The SQL involved is pretty subtle, so I went out of the way test it as much as I could.

Comment thread nexus/auth/src/authz/api_resources.rs Outdated
Comment on lines +1248 to +1254
impl oso::PolarClass for SubnetPoolList {
fn get_polar_class_builder() -> oso::ClassBuilder<Self> {
oso::Class::builder()
.with_equality_check()
.add_attribute_getter("fleet", |_: &SubnetPoolList| FLEET)
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add a comment here or anything, since this looks like it's probably really straightforward if you know some basics, but could you give me (or point me at) a quick tutorial on Oso sometime?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dave Pacheco would be the best person for detailed questions, but we have a debugging guide here and some general background in the block comment starting here.

Comment thread nexus/db-lookup/src/lookup.rs Outdated
Comment thread nexus/db-model/src/external_subnet.rs Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing these are some sort of schema update fragments? Is it like each DDL statement has to be in its own file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right. We have some docs here, but the gist is that you write each individual step in the update in its own file, and we run them under in a transaction during an upgrade. We have a test that ensures running all the updates results in the same schema as the overall dbinit.sql file.

Comment on lines +679 to +681
let Error::InvalidRequest { message } = &err else {
panic!("expected a NotFound error, found {err:#?}");
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use assert_matches! instead of panic here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely. I like this formulation since it destructures the returned type too, but either would work.

Comment on lines +871 to +883
let _ = datastore
.unlink_subnet_pool_from_silo(
opctx,
&authz_pool,
&pool,
&authz_silo,
)
.await
.expect("able to unlink pool from silo");
let _ = datastore
.link_subnet_pool_to_silo(opctx, &new_authz_pool, &authz_silo, true)
.await
.expect("able to link pool to silo after unlinking other");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust style question: would it be (marginally) better to have tests like this return Result, so we could just write:

datastore.unlink_subnet_pool_from_silo(...).await?;

Or is it preferable to do the let _ = [...].expect() for a more explicit test failure message?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally do the latter, to get more contextual information about the failure and why we didn't expect it. But you can definitely write tests that return a Result directly. I believe the test harness will unwrap it in that case, but I'm not sure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to make sure we're using the expected indexes for these queries? Can we EXPLAIN ANALYZE or something with CRDB?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do that sometimes, yes. We have a trait for explaining a query and getting the results. The main way we enforce using indexes is that all queries are forbidden to result in a plan with a full table or index scan. (The exception here is that you can disable this for data migrations during a schema upgrade.) This doesn't depend on the size of the table. No full scans at any point, of anything.

I use that ExplainableAsync trait in this PR, in the can_explain_insert_external_subnet_from_explicit_ip_query test.

Comment on lines +297 to +299
/// This query is pretty complicated. First, we support creating a subnet in a
/// few different ways, such as by an explicit IP subnet, automatically from a
/// specific pool, or automatically from a pool linked to the current Silo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to express all cases in one query, or is possible to split them up?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on what you mean, but there really are four separate queries generated by this function:

  • requesting by explicit subnet
  • requesting by explicit pool (name or ID)
  • requesting by default pool of specific version
  • requesting the default pool, of which there has to be only one.

If you mean, "can we split up all these CTEs into different queries", then...you could, at the cost of an explicit interactive transaction. We try to avoid that, but it's obviously a tradeoff. This query is big and complicated, and the contention we might run into could be worth a simpler query. I personally try very hard to avoid explicit transactions, but I'm probably overly-sensitive to that.

Comment thread nexus/db-queries/src/db/queries/external_subnet.rs Outdated

@FelixMcFelix FelixMcFelix left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ben, I have a few higher-level questions but otherwise this looks like it's handling some very tricky problems correctly. Thanks a lot for the robust testing here, particularly around gap sizing and allocation.

Comment thread nexus/db-model/src/external_subnet.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/external_subnet.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/external_subnet.rs
Comment thread nexus/db-queries/src/db/datastore/external_subnet.rs Outdated
Comment thread nexus/types/src/external_api/params.rs Outdated
Comment thread schema/crdb/dbinit.sql
Comment thread nexus/types/src/external_api/params.rs Outdated
Comment thread nexus/db-queries/src/db/queries/external_subnet.rs Outdated
- Fix a couple of index and schema update fragment bugs
- Comments on some subtle / confusing database details
- Handle overflow at the very end of IP address space
- Remove `silo_id` from the `external_subnet` table.
- Various typo fixes
@bnaecker

bnaecker commented Jan 22, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks @FelixMcFelix and @mergeconflict for the thoughtful reviews. I've addressed almost all the comments. There are two outstanding items.

  • Check for overlap between IP Pools and Subnet Pools when inserting either. Add a test on both sides.
  • Make a new Nexus API version for the changed ExternalSubnetAllocator variants that disallow both a subnet and a pool, since the former fully-constrains the latter.

Sorry to say both of these will add even more the PR. At least its mostly generated code? 😬

- Ensure we catch and handle overlap between both when inserting either
- Add tests for both cases
- Remove straggling `silo_id` from `external_subnet` table.

@mergeconflict mergeconflict left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending fixes!

Comment thread nexus/db-model/src/schema_versions.rs
Comment thread nexus/db-queries/src/db/datastore/external_subnet.rs
- Put back accidentally deleted schema change
- Fix schema update indexes again
- Clippy nits
@bnaecker

Copy link
Copy Markdown
Collaborator Author

@FelixMcFelix Let me know if you have any other feedback, I'd love to land this tomorrow.

@FelixMcFelix FelixMcFelix left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One catch on the default max prefix length, but otherwise looks solid. Thanks!

Comment thread nexus/db-model/src/external_subnet.rs Outdated
@bnaecker bnaecker enabled auto-merge (squash) January 23, 2026 15:59
@bnaecker bnaecker merged commit 3c2b2d1 into main Jan 23, 2026
16 checks passed
@bnaecker bnaecker deleted the external-subnet-database-crud branch January 23, 2026 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Database models and queries for external subnets

3 participants