Attached (External) Subnets#911
Conversation
374d344 to
a832697
Compare
a832697 to
5dc70c8
Compare
This PR reworks certain aspects of how NAT and gateway layers are constructed to enable attached (external) subnets to function. The primary aim here is that traffic to/from these subnets (owned by the host) do not undergo NAT, and bypass spoof detection as transit IPs would. A few wider changes have been necessary to ensure that these can be attached/detached without breaking any existing transit IPs, and to ensure that traffic originated from an external subnet cannot be directed towards a private VPC recipient.
5dc70c8 to
2e7cf69
Compare
| use opte_api::Protocol; | ||
|
|
||
| pub trait Ip: Into<super::IpAddr> { | ||
| pub trait Ip: Into<super::IpAddr> + Send + Sync { |
There was a problem hiding this comment.
Followup from #883 -- this moves the Send + Sync bounds up so that we don't need to explicitly specify them in as many spots.
| pub fn attach_subnet( | ||
| port: &Port<VpcNetwork>, | ||
| inet_gw_map: Option<&InternetGatewayMap>, | ||
| vpc_mappings: &Arc<VpcMappings>, | ||
| req: AttachSubnetReq, | ||
| ) -> Result<(), OpteError> { |
There was a problem hiding this comment.
I had a bit of uncertainty as to whether this and detach_subnet should live here, gateway, or some other module since the feature crosses multiple layers.
There was a problem hiding this comment.
Living in its own module seems to make the most sense to me. I would not think to come to the nat module to look for attached subnet machinery.
bnaecker
left a comment
There was a problem hiding this comment.
I've not looked through the entire PR yet, but this seems solid. Nice interface for attaching subnets, and some helpful cleanup along the way. Thanks!
zeeshanlakhani
left a comment
There was a problem hiding this comment.
one Q atm, but this is looking really good. I just want to do some realigning with the RFD before approval.
| }; | ||
|
|
||
| encap_external(inner_pkt, bsvc_phys, guest_phys) | ||
| encap_external(inner_pkt, *BSVC_PHYS, guest_phys) |
There was a problem hiding this comment.
Where is this const being defined now? Or is it generated?
There was a problem hiding this comment.
It's defined in opte_test_utils, here:
opte/lib/opte-test-utils/src/lib.rs
Lines 1000 to 1004 in eb15888
zeeshanlakhani
left a comment
There was a problem hiding this comment.
This looks good at the MVP level, until we do further integrated testing.
rcgoodfellow
left a comment
There was a problem hiding this comment.
Thanks Kyle! Comments/questions follow
| /// be received/sent. | ||
| #[derive(Debug, Clone, Serialize, Deserialize, Default, Eq, PartialEq)] | ||
| pub struct AttachedSubnetConfig { | ||
| /// Denotes whether this attached subnet is an external IP block, |
There was a problem hiding this comment.
So if it's not external, then NAT will be applied? What is that being used for?
There was a problem hiding this comment.
Again, attached VPC subnets. The instance owns, say, 172.30.0.0/22 and sends trafffic from 172.30.2.128 to 1.1.1.1 -- NAT still needs to be applied, since we're in the VPC's view of that private address space.
The mechanics in OPTE are simple enough for this, since we just don't touch the NAT layer. I know we've punted on the control plane logic for attached VPC subnets to the next release, but I figured it'd be better to avoid more churn.
There was a problem hiding this comment.
In that case we'd be implementing an N:1 NAT in that case where N is the size of the subnet.
I think we'll need to make sure this NAT-from-attached-vpc-subnet path is only on VPC subnets attached to the primary interface so if the guest sends non-VPC traffic out a secondary instance for whatever reason, we don't scoop that up into a NAT vortex and replys come back on the primary interface.
There was a problem hiding this comment.
That tracks. I know at least today the limit on any kind of normal external IP (SNAT/EIP/FIP) is that those must be bound to the primary interface.
| pub fn attach_subnet( | ||
| port: &Port<VpcNetwork>, | ||
| inet_gw_map: Option<&InternetGatewayMap>, | ||
| vpc_mappings: &Arc<VpcMappings>, | ||
| req: AttachSubnetReq, | ||
| ) -> Result<(), OpteError> { |
There was a problem hiding this comment.
Living in its own module seems to make the most sense to me. I would not think to come to the nat module to look for attached subnet machinery.
| // strong isolation which some customers require. | ||
| // | ||
| // In future we want this to be a tunable property of the VPC. In this | ||
| // case we would require an extra table/poptrie per VPC, containing all |
There was a problem hiding this comment.
This works out nicely for attached subnets. But I think regular floating/ephemeral IPs it will be trickier for large pools since we have no subnet aggregation and the routing table will essentially be a bunch of /32s or /128s for v4/v6 respectively.
There was a problem hiding this comment.
I'll put some of this commentary in, you're right that this hypothetical is a bit thornier for individual EIPs. I think it's a little worse in that you can't often aggregate /32s and /128s here -- the value type would be the PhysAddr or (IpAddr, VID) corresponding to the target instance. Although I wouldn't expect the table size to be any larger than the V2P map?
There was a problem hiding this comment.
Yeah that's what I was trying to say, is that we cannot really aggregate here. Well ... we could with a clever aggregating routing protocol, but we'd need to disaggregate on migration and would continuously be recomputing aggregations as instances are created/destroyed. It seems like it'd be tricky to get right.
This PR reworks certain aspects of how NAT and gateway layers are constructed to enable attached (external) subnets to function. The primary aim here is that traffic to/from these subnets (owned by the host) do not undergo NAT, and bypass spoof detection as transit IPs would. A few wider changes have been necessary to ensure that these can be attached/detached without breaking any existing transit IPs, and to ensure that traffic originated from an external subnet cannot be directed towards a private VPC recipient.
VpcCfgrather than working directly on the port's ruleset. This also gives us the opportunity to set these on port create rather than using a followup ioctl.DhcpCfginVpcCfgnow.VpcCfginXdeDevand the port itself seemed... misleading, at best. When XDE needs that, we now reach throughport.network().StatefulActions should never have had mutable access to a packet'sActionMeta(the string-to-string map), because they may never be rerun after an LFT is created on subsequent slowpath walks. It is now up to the generatedActionDescs to update metadata -- this trait's role is similar toStaticAction, which is allowed mutable access to this metadata.ActionMetaergonomics to make this easier to express and implement.As far as the control plane is concerned, it should need to only:
Instance(<name>)target (in turn resolved to an IP target when reifying rules for the port, as happens with instance targets today).Answers the functional dataplane requirements of #890. Closes #703.