VPC Subnet Routing#490
Conversation
This also adds in the `RouterClass` specifier, which does not yet do anything.
9149747 to
1ebbbb6
Compare
RFD 21 goes into some detail describing use cases where users might want to use one instance as a tunnel endpoint for a VPN, and how traffic towards an arbitrary CIDR can be routed to an instance for, e.g., software routing purposes. To enable this, we need to be able to setup holes in the anti-spoof filters in the gateway, which are added here as a new pair of ioctls to add and remove these holes. It's not clear from the RFD whether this is to be an on/off toggle or as fine-grained as the new ioctls allow, but we can suoport either case here.
zeeshanlakhani
left a comment
There was a problem hiding this comment.
Just some minor stuff while looking at this code in draft.
| dest: IpCidr, | ||
| vpc_mappings: Arc<VpcMappings>, | ||
| ) -> (Rule<Finalized>, Rule<Finalized>) { | ||
| let vpc_meta = Arc::new(VpcMeta::new(vpc_mappings)); |
There was a problem hiding this comment.
n00b2 Q: Would it make sense to implement MetaAction (used with vpc_meta / Action::Meta below) on vpc_mappings directly or impl it on Arc<VpcMappings> so you don't have to initialize the Arc here (again)?
There was a problem hiding this comment.
The double-Arc is really unfortunate for sure. We could impl MetaAction for VpcMappings, but we'd need to recast the design of the MetaAction to use something additional like RuleDisplay instead of Display, I think?
There was a problem hiding this comment.
@FelixMcFelix ah interesting, I'll look further at this then.
Breaks out the transit IPs/CIDR allowlist functions as requested.
luqmana
left a comment
There was a problem hiding this comment.
Thanks for putting this together Kyle! The demo was great and excited to see this land. Looks good overall but one question about dealing with partial failures during add/remove.
FelixMcFelix
left a comment
There was a problem hiding this comment.
Thanks for the review Luqman!
| // | ||
| // `prefix_len` comes from the passed in `cidr` argument. | ||
| // | ||
| // One bit is used to ensure that 'custom' router rules take precedence |
There was a problem hiding this comment.
Q: is there a test that specifically tests precedence? It's probably already captured, but I thought to ask.
There was a problem hiding this comment.
Thanks for taking a look. The new system vs. custom route precedence is tested in the new intra_subnet_routes_with_custom test, adding a second rule with an identical destination CIDR:
opte/lib/oxide-vpc/tests/integration_tests.rs
Lines 4064 to 4090 in 789b927
I don't believe we have any explicit tests on e.g. the priority values we're using to get LPM-like behaviour, though.
luqmana
left a comment
There was a problem hiding this comment.
One small suggestion for opteadm but otherwise lgtm! Thanks Kyle.
| /// Control whether traffic on the CIDR should be allowed for | ||
| /// inbound or outbound traffic. | ||
| #[arg(long = "dir")] | ||
| direction: Direction, |
There was a problem hiding this comment.
Given opteadm is primarily a dev tool, I think making this an Option<Direction> and doing the ioctl calls for both directions if it's None is fine.
With rollback behaviour!
This PR wires up all the backing machinery for VPC subnet routing, and
automatically resolves and pushes updated rules to sleds using an RPW.
This allows instances in all subnets of a VPC to talk with one another
-- assuming no firewall rules have been configured otherwise. At a high
level, this works by a few changes:
* During the VPC create saga, we now push two rules explicitly to the
system router -- default routes from `(0.0.0.0/0, ::/0) ->
inetgw:outbound`.
* Any CRUD operation on a VPC subnet will reconcile the set of VPC
subnet routes within the system router to have one entry per subnet.
This takes the form `subnet:{name} -> subnet:{name}` for each subnet,
which are later resolved to both v4 and v6 entries.
* Ports are created using route information known to sled-agent -- this
defaults to an empty route set for instances/probes, and an internet
gateway rule for services to enable early NTP sync.
* Routes are sync'd with sleds using a new background task. Broadly,
this asks each sled for the set of VPCs and subnets it has ports on, and
a version for the current route set installed in each. The background
task will use this information to determine which routes must be
rebuilt, and will send updated versions out in response.
The most immediate consequence in this PR is that hosts within a subnet
-- on different VPCs -- will be able to talk with one another at last.
The user facing API (#2116) will be re-enabled in a concurrent PR --
#5823 -- as will NIC spoof detection hole-punching.
Depends on oxidecomputer/opte#490.
Closes #2232, Fixes #1336.
---
A few pieces will block tests passing & merge-readiness:
- [x] Creation of a `lab-2.0-opte-0.32` image.
- [x] Merge of oxidecomputer/maghemite#274 (and updating all the right
SHAs in this PR).
This PR introduces various features needed in OPTE side for VPC subnet routing:
192.168.0.0/16 -> ip(instance_abc)for software routing or enabling VPN endpoints.I'm leaving this in draft while the omicron-side pieces are in development.