VPC Subnet Routing [2/2] -- Custom Routers and NIC 'transit IP' lists#5823
Conversation
OPTE now prevents itself from being unloaded if its underlay state is set. Currently, underlay setup is performed only once, and it seems to be the case that XDE can be unloaded in some scenarios (e.g., `a4x2` setup). However, a consequence is that removing the driver requires an extra operation to explicitly clear the underlay state. This PR adds this operation to the `cargo xtask virtual-hardware destroy` command. This is currently blocked on opte#485 being approved/merged. Closes #5314.
These update in response to VPC subnet changes. Now to plumb them into OPTE.
Currently there are no triggers attached to most of the operations that will cause us to either a) push or b) re-resolve VPC routes, but this lays the basis for sled-agent and the background task to talk in terms of versions.
We'll get to that in the next PR.
495c69b to
ab7223b
Compare
| pub identity: IdentityMetadataCreateParams, | ||
| /// The location that matched packets should be forwarded to. | ||
| pub target: RouteTarget, | ||
| /// A CIDR block (or named subnet) which this route will apply to. |
There was a problem hiding this comment.
Is this accurate? I see a destination can be an IP, IP net, VPC, or subnet.
There was a problem hiding this comment.
Thanks for looking over this!
It can be any of those, except for VPCs -- that will likely relax in future with VPC peering, but it's not something that makes sense right now. I'll try and rework the wording for this, I maybe want to make it clearer that 'destination' matches on the address in routed packets.
There was a problem hiding this comment.
Ok, sounds good. I was asking because I saw more stuff on the RouteDestination type. But it's fine if there are things that are allowed by the type that are nonetheless rejected by the endpoint logic as invalid, and I see you're doing that nicely here.
| /// Custom routers apply in addition to the VPC-wide *system* router, and have | ||
| /// higher priority than the system router for an otherwise | ||
| /// equal-prefix-length match. | ||
| pub custom_router: Option<NameOrId>, |
There was a problem hiding this comment.
I was wondering if this could just be called router and then on the VpcSubnet view we can call it router_id, and we always include the ID. But I see now, "Custom routers apply in addition to the VPC-wide system router" means that doesn't make sense, right? Because when there is a custom router, there is still also the system router. It's not a default vs. custom situation. So I think this is good.
david-crespo
left a comment
There was a problem hiding this comment.
The API side looks good to me. transit_ips vec seems fine. We can always tweak it after we see how it feels in practice.
| /// The location that matched packets should be forwarded to. | ||
| pub target: RouteTarget, | ||
| /// The set of destination IP addresses or subnets that this route | ||
| /// will match packets against. |
There was a problem hiding this comment.
Sorry to keep nitpicking, but saying "set" and "subnets" might be confusing because this can only be one subnet. I can't think of a better wording — maybe rely on the definiton of RouteDestination to cover what is inside it, and here emphasize its role in the route. I like this bit in the doc comment on RouteDestination, though it's kind of long:
/// When traffic is to be sent to a destination that is within a given
/// `RouteDestination`, the corresponding `RouterRoute` applies, and traffic
/// will be forward to the `RouteTarget` for that rule.
There was a problem hiding this comment.
Yeah, the existing RouteDestination text is good there. I think if we lean into just showing its role and delegating the details to the type, I'd go for:
/// Selects which traffic this routing rule will apply to.
pub destination: RouteDestination,This PR wires up all the backing machinery for VPC subnet routing, and
automatically resolves and pushes updated rules to sleds using an RPW.
This allows instances in all subnets of a VPC to talk with one another
-- assuming no firewall rules have been configured otherwise. At a high
level, this works by a few changes:
* During the VPC create saga, we now push two rules explicitly to the
system router -- default routes from `(0.0.0.0/0, ::/0) ->
inetgw:outbound`.
* Any CRUD operation on a VPC subnet will reconcile the set of VPC
subnet routes within the system router to have one entry per subnet.
This takes the form `subnet:{name} -> subnet:{name}` for each subnet,
which are later resolved to both v4 and v6 entries.
* Ports are created using route information known to sled-agent -- this
defaults to an empty route set for instances/probes, and an internet
gateway rule for services to enable early NTP sync.
* Routes are sync'd with sleds using a new background task. Broadly,
this asks each sled for the set of VPCs and subnets it has ports on, and
a version for the current route set installed in each. The background
task will use this information to determine which routes must be
rebuilt, and will send updated versions out in response.
The most immediate consequence in this PR is that hosts within a subnet
-- on different VPCs -- will be able to talk with one another at last.
The user facing API (#2116) will be re-enabled in a concurrent PR --
#5823 -- as will NIC spoof detection hole-punching.
Depends on oxidecomputer/opte#490.
Closes #2232, Fixes #1336.
---
A few pieces will block tests passing & merge-readiness:
- [x] Creation of a `lab-2.0-opte-0.32` image.
- [x] Merge of oxidecomputer/maghemite#274 (and updating all the right
SHAs in this PR).
This PR builds on #5777 to provide the Custom routers for subnets as described in RFD21. This entails a few things:
unpublished = truetag from the user API for VPC routers and routes.custom_routerfield in subnetPOSTandPUTrequests.transit_ipslist, which denotes an additional set of CIDR blocks that a NIC is allowed to send and receive traffic on. This is set duringPOSTand/orPUTon instances which are stopped. This is a key feature to enable software routing by instances, as today's default behaviour drops any packets not matching an assigned IP for an instance.There are some allowances around currently non-existent features:
inetgw:outbound, which appears in our existing rules. Using this target sends packets upstream as it does today.Closes #2116.