Skip to content

Add dual-stack IP config for internal NICs#9389

Merged
bnaecker merged 15 commits into
mainfrom
dual-stack-internal-nic-types
Nov 20, 2025
Merged

Add dual-stack IP config for internal NICs#9389
bnaecker merged 15 commits into
mainfrom
dual-stack-internal-nic-types

Conversation

@bnaecker

Copy link
Copy Markdown
Collaborator
  • Add dual-stack VPC private address configuration type and include it in the shared NetworkInterface type.
  • Add database model support for reading / writing the dual-stack NIC type to the database, handling serialization into optional fields.
  • Update all the callsites to handle the new dual-stack-aware type.
  • Add a bunch of conversions for the APIs which rely on that new type, of which there are many. This also adds the conversions and older types into the sled-agent-types crate, so they can be used in a few places that don't directly depend on the sled-agent-api crate itself, notably reconfigurator and nexus-inventory.
  • Update the sled-agent reconciler to deserialize previous versions of its sled-configuration ledgers, convert them, and write them out again as the new versions.
  • Updates the OPTE PortManager type with the new dual-stack support. This is only for private IP addresses, though. We still need some work to support OPTE ports with dual stack external addresses. This is about half of Enable dual-stack OPTE ports in the sled-agent #9247, the VPC-private part.
  • Closes Want dual-stack support for NICs between Nexus and sled-agent #9246

@bnaecker bnaecker marked this pull request as draft November 12, 2025 04:49
@bnaecker bnaecker force-pushed the dual-stack-internal-nic-types branch from 9d04b7a to 83a21bb Compare November 12, 2025 20:41
@bnaecker bnaecker marked this pull request as ready for review November 12, 2025 20:42
@bnaecker bnaecker force-pushed the dual-stack-internal-nic-types branch from 83a21bb to 2033231 Compare November 12, 2025 20:48
@bnaecker

Copy link
Copy Markdown
Collaborator Author

Thanks in advance to the reviewers. It turns out we use the NetworkInterface type basically everywhere, and so this change is uncomfortably large.

- Add dual-stack VPC private address configuration type and include it
  in the shared NetworkInterface type.
- Add database model support for reading / writing the dual-stack NIC
  type to the database, handling serialization into optional fields.
- Update all the callsites to handle the new dual-stack-aware type.
- Add a bunch of conversions for the APIs which rely on that new type,
  of which there are many. This also adds the conversions and older
  types into the `sled-agent-types` crate, so they can be used in a few
  places that don't directly depend on the `sled-agent-api` crate
  itself, notably reconfigurator and `nexus-inventory`.
- Update the sled-agent reconciler to deserialize previous versions of
  its sled-configuration ledgers, convert them, and write them out again
  as the new versions.
- Updates the OPTE `PortManager` type with the new dual-stack support.
  This is only for private IP addresses, though. We still need some work
  to support OPTE ports with dual stack external addresses. This is
  about half of #9247, the VPC-private part.
- Closes #9246
@bnaecker bnaecker force-pushed the dual-stack-internal-nic-types branch from 2033231 to fe87906 Compare November 12, 2025 20:49
@bnaecker

Copy link
Copy Markdown
Collaborator Author

I'll figure out how to resolve the conflicts with the multicast work on Monday.

@bnaecker

Copy link
Copy Markdown
Collaborator Author

I'm going to wait to fix the conflicts until we address some changes in the way the conflicting code was structured. I'll still have conflicts to resolve that point, but that will be entirely mechanical. The PR is still ready for review as it stands.

@rcgoodfellow rcgoodfellow left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting this posted @bnaecker. Review comments follow. However, this touches on a lot of Omicron systems I'm not very familiar with like inventory, blueprints, ledgers and reconfigurator. It would be good to have someone who works in those systems to take a look.

Comment thread common/src/api/internal/shared/network_interface/v2.rs Outdated
Comment thread dev-tools/reconfigurator-cli/src/lib.rs Outdated
Comment thread illumos-utils/src/opte/port_manager.rs Outdated
Comment thread illumos-utils/src/opte/port_manager.rs Outdated
Comment thread illumos-utils/src/opte/port_manager.rs
Comment thread nexus/db-model/src/network_interface.rs Outdated
Comment thread nexus/db-queries/src/db/datastore/deployment/external_networking.rs Outdated
Comment thread sled-agent/api/src/lib.rs
path = "/vmms/{propolis_id}",
versions = ..VERSION_ADD_DUAL_STACK_SHARED_NETWORK_INTERFACES,
}]
async fn v6_vmm_register(

@rcgoodfellow rcgoodfellow Nov 18, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was very confusing to me at first as i was reading v6 as IPv6 and did not realize this was the old method being moved to API version 6.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily once this catches up with main it'll be somewhere between v8 and v10 instead 😅

Comment thread sled-agent/src/services.rs Outdated
Comment thread common/src/api/internal/shared/network_interface/v2.rs Outdated
Comment thread common/src/api/internal/shared/network_interface/v2.rs Outdated
Comment thread common/src/api/internal/shared/network_interface/v2.rs Outdated
Comment thread common/src/api/internal/shared/network_interface/mod.rs Outdated
Comment thread illumos-utils/src/opte/mod.rs Outdated
Comment thread sled-agent/api/src/lib.rs
path = "/vmms/{propolis_id}",
versions = ..VERSION_ADD_DUAL_STACK_SHARED_NETWORK_INTERFACES,
}]
async fn v6_vmm_register(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily once this catches up with main it'll be somewhere between v8 and v10 instead 😅

Comment thread sled-agent/config-reconciler/src/ledger.rs
Comment thread sled-agent/config-reconciler/src/ledger.rs
Comment thread sled-agent/config-reconciler/src/ledger/legacy_configs.rs Outdated
Comment thread illumos-utils/src/opte/port_manager.rs Outdated
- Avoid panics in DB conversion and Omicron-to-OPTE IP configuration
  conversion
- Clean up some error messages
- Add helpers to return any IP address for an OPTE port / gateway
- Remove explicit v2 module for current version of NIC type
This adds only a v8 into the public sled-agent-types crate with the
previous version. It adds infallible conversions from older API types to
and from v8, rather than changing those conversions to be directly to
the latest type. Also call the "next" version of the actual API handler
too, rather than jumping to the latest.
@bnaecker

Copy link
Copy Markdown
Collaborator Author

Ok @jgallagher @rcgoodfellow I believe I've addressed your comments and fixed up all the merge conflicts with main. This should be ready for another look when you get a chance. Thanks!

@jgallagher jgallagher left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread illumos-utils/src/opte/port_manager.rs Outdated
.subnet()
.iter()
// The 0th address is the network address, and OPTE's
// gateway sits at the next. See

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this call v6.opte_gateway() instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! Yeah, I added those methods and never used them. Thanks!

Comment thread illumos-utils/src/opte/port_manager.rs Outdated
floating_ips: &[IpAddr],
is_ipv4_only: bool,
) -> Result<Ipv4Cfg, Error> {
let gateway_ip = v4.subnet().first_host().into();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this call v4.opte_gateway() instead?

Comment thread sled-agent/api/src/lib.rs Outdated
versions =
VERSION_ADD_PROBE_PUT_ENDPOINT..VERSION_ADD_DUAL_STACK_SHARED_NETWORK_INTERFACES,
}]
async fn v6_probes_put(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be v9 instead of v6?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, yes. Thanks.

@bnaecker bnaecker enabled auto-merge (squash) November 20, 2025 00:02
@bnaecker bnaecker merged commit f062472 into main Nov 20, 2025
17 checks passed
@bnaecker bnaecker deleted the dual-stack-internal-nic-types branch November 20, 2025 03:21
bnaecker added a commit that referenced this pull request Dec 18, 2025
This is a follow-up to #9389 that I missed. We were already separating
transit IPs by version in the code before and after the database. This
pushes that separation all the way into the database, which will make it
easier to implement updates to dual-stack NICs in the public API.
Separating them lets clients update the transit IPs for only one
version, without some janky read-modify-write code that ensures we don't
clobber the transit IPs for the _other_ version.
bnaecker added a commit that referenced this pull request Dec 19, 2025
This is a follow-up to #9389 that I missed. We were already separating
transit IPs by version in the code before and after the database. This
pushes that separation all the way into the database, which will make it
easier to implement updates to dual-stack NICs in the public API.
Separating them lets clients update the transit IPs for only one
version, without some janky read-modify-write code that ensures we don't
clobber the transit IPs for the _other_ version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Want dual-stack support for NICs between Nexus and sled-agent

4 participants