Conversation
f18d66e to
d8b78f1
Compare
|
Coverage increased (+0.2%) to 44.655% when pulling 5a3a9f91daa78cf2bfea3b29f129f8cf40ca8c55 on pr/tgraf/aws-eni-ipam into f4955ec on master. |
|
test-me-please |
|
test-me-please |
|
test-me-please Tracked down to |
d8b78f1 to
8f98f47
Compare
8f98f47 to
6f1108c
Compare
|
test-me-please The following tests failed: Failing because #8322 is not merged yet |
There was a problem hiding this comment.
I think overall, the architecture of this is solid and well-documented. I'd probably have to immerse myself in the code by running it in a cluster and playing around to understand the nitty-gritty details given the size of the PR.
Most of my comments are documentation / clarification about certain behaviors & operations.
Are we marking this as GA for v1.6? Or beta?
That question is still open. There is a large demand for this feature and several users are already testing this and planning to use it in critical environments as soon as possible. I think we should make a call on how to label it before we release based on the experiences gained until then. |
3928c1f to
65a2477
Compare
|
test-me-please |
Signed-off-by: Thomas Graf <thomas@cilium.io>
We will have code during Daemon initialization rely on being able to access custom resources. Signed-off-by: Thomas Graf <thomas@cilium.io>
The following additional information is exposed via the API: * IPAM mode * Gatway IP (optional) * Additional CIDRs to which the IP has access to (optional) * MAC of master interface (optional) This is in preperation of upcoming ENI mode Signed-off-by: Thomas Graf <thomas@cilium.io>
Building on the previous commit, this allows an IPAM plugin to provide the additional allocation information. Signed-off-by: Thomas Graf <thomas@cilium.io>
This adds an initial, minimal version of a CiliumNode CRD-backed IP allocator. It is enabled by setting --ipam=crd and will hand out IPs based on the IPs made available via the custom resource. An outside operator or user can provide IPs via a map spec.ipam.avaiable. The agent will use those IPs for allocation as they become available an update the map status.ipam.used to list all used IPs. This is a minimal version targeted for use by the operator for ENI support. Further versions of this could become more sophisticated to also target manual configuration use cases or integration into other IPAM systems. Signed-off-by: Thomas Graf <thomas@cilium.io>
The AWS ENI allocator builds on top of the CRD-backed allocator. Each node a creates a ciliumnodes.cilium.io custom resource matching the node name when Cilium starts up for the first time on that node. It contacts the EC2 metadata API to retrieve instance ID, instance type, and VPC information and populates the custom resource with this information. ENI allocation parameters are provided as agent configuration option and are passed into the custom resource as well. The architecture ensures that only a single operator communicates with the EC2 service API to avoid rate-limiting issues in bigger clusters. A pre-allocation watermark allows to have IP addresses available for use on nodes at all time without requiring to contact the EC2 API when a new pod is scheduled in the cluster. The Cilium operator listens for new ciliumnodes.cilium.io custom resources and starts managing the IPAM aspect automatically. It scans the EC2 instances for existing ENIs with associated IPs and makes them available via the spec.ipam.available field. It will then constantly monitor the used IP addresses in the status.ipam.used field and automatically create ENIs and allocate more IPs as needed to meet the IP pre-allocation watermark. This ensures that there are always IPs available The selection of subnets to use for allocation as well as attachment of security groups to new ENIs can be controlled separately for each node. This makes it possible to hand out pod IPs with differing security groups on individual nodes. Configuration ------------- * The Cilium agent and operator must be run with the option --ipam=eni or the option ipam: eni must be set in the ConfigMap. This will enable ENI allocation in both the node agent and operator. Cache of ENIs and Subnets ------------------------- The operator maintains a list of all EC2 ENIs and subnets associated with the AWS account in a cache. For this purpose, the operator performs the following two EC2 API operations: * DescribeNetworkInterfaces * DescribeSubnets The cache is updated once per minute or after an IP allocation or ENI creation has been performed. When triggered based on an allocation or creation, the operation is performed at most once every 15 seconds. Publication of available ENI IPs -------------------------------- Following the update of the cache, all CiliumNode custom resources representing nodes are updated to publish eventual new IPs that have become available. In this process, all ENIs with an interface index greater than spec.eni.first-interface-index are scanned for all available IPs. All IPs found are added to spec.ipam.available. Each ENI meeting this criteria is also added to status.eni.enis. If this updated caused the custom resource to change, the custom resource is updated using the Kubernetes API methods Update() and/or UpdateStatus() if available. Determination of ENI IP deficits -------------------------------- The operator constantly monitors all nodes and detects deficits in available ENI IP addresses. The check to recognize a deficit is performed on two occasions: * When a CiliumNode custom resource is updated * All nodes are scanned in a regular interval (once per minute) When determining whether a node has a deficit in IP addresses, the following calculation is performed: spec.eni.preallocate - (len(spec.ipam.available) - len(status.ipam.used)) Upon detection of a deficit, the node is added to the list of nodes which require IP address allocation. When a deficit is detected using the interval based scan, the allocation order of nodes is determined based on the severity of the deficit, i.e. the node with the biggest deficit will be at the front of the allocation queue. The allocation queue is handled on demand but at most every 5 seconds. IP Allocation ------------- When performing IP allocation for a node with an address deficit, the operator first looks at the ENIs which are already attached to the instance represented by the CiliumNode resource. All ENIs with an interface index greater than CiliumNode.Spec.ENI.FirstInterfaceIndex are considered for use. The operator will then pick the first already allocated ENI which meets the following criteria: * The ENI has addresses associated which are not yet used or the number of addresses associated with the ENI is lesser than the instance type specific limit. * The subnet associated with the ENI has IPs available for allocation The following formula is used to determine how many IPs are allocated on the ENI: min(AvailableOnSubnet, min(AvailableOnENI, NeededAddresses + spec.eni.max-above-watermark)) This means that the number of IPs allocated in a single allocation cycle can be less than what is required to fulfill spec.eni.preallocate. In order to allocate the IPs, the method AssignPrivateIpAddresses of the EC2 service API is called. When no more ENIs are available meeting the above criteria, a new ENI is created. ENI Creation ------------ As long as an instance type is capable allocating additional ENIs, ENIs are allocated automatically based on demand. When allocating an ENI, the first operation performed is to identify the best subnet. This is done by searching through all subnets and finding a subnet that matches the following criteria: * The VPC ID of the subnet matches spec.eni.vpc-id * The Availability Zone of the subnet matches spec.eni.availability-zone * The subnet contains all tags as specified by spec.eni.subnet-tags If multiple subnets match, the subnet with the most available addresses is selected. After selecting the ENI, the interface index is determine. For this purpose, all existing ENIs are scanned and the first unused index greater than spec.eni.first-interface-index is selected. After determining the subnet and interface index, the ENI is created and attached to the EC2 instance using the methods CreateNetworkInterface and AttachNetworkInterface of the EC2 API. The security groups attached to the ENI will be equivalent to spec.eni.security-groups. The description will be in the following format: "Cilium-CNI (<EC2 instance ID>)" ENI Deletion Policy ------------------- ENIs can be marked for deletion when the EC2 instance to which the ENI is attached to is terminated. In order to enable this, the option CiliumNode.Spec.ENI.DeleteOnTermination can be enabled. If enabled, the ENI is modifying after creation using ModifyNetworkInterface to specify this deletion policy. Node Termination ---------------- When a node or instance terminates, the Kubernetes apiserver will send a node deletion event. This event will be picked up by the operator and the operator will delete the corresponding ciliumnodes.cilium.io custom resource. Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
This adds an option --auto-create-ciliumnode-resource to automatically create a CiliumNode custom resource on startup with the ENI parameters automatically derived from the AWS metadata API. The created custom resource is bound to the lifecycle of the Kubernetes node resource so it gets automatically deleted when the node is removed. Signed-off-by: Thomas Graf <thomas@cilium.io>
This adds operator support for ENI allocation. See the previous commits for the ENI details. The operator will respect the `--ipam=eni` just like the agent. In general, it makes sense to also enable --enable-metrics to get ENI specific allocation metrics. Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
With the introduction of the ENI allocation capability, the operator must be running in host networking mode as otherwise it would depends on its own allocation capability in order to get scheduled. Signed-off-by: Thomas Graf <thomas@cilium.io>
Support specification of all ENI parameters via the CNI configuration file via the option `--read-cni-conf`. Also add support for a CNI chaining configuration in which case the configuration is automatically derived from the Cilium specific section of the chaining configuration Signed-off-by: Thomas Graf <thomas@cilium.io>
This extends support to specify: - Priority - Mark mask - Source and destination filters Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
The ENI interface is automatically created and detected but needs to be brought up and the MTU must be specified. The MTU is derived from the primary interface by the agent and then inherited to all secondary ENIs. Routing rules are set up to ensure that all traffic from endpoints are using the ENI for egress which corresponds to the IP address assigned to the endpoint. Signed-off-by: Thomas Graf <thomas@cilium.io>
This adds ENI support to the masquerading logic. For external SNAT, the destination address exclusion is derived from the VPC's primary CIDR. For now, the list of interfaces to masquerade on must be specified with --egress-masquerade-interfaces with an interface selector. This is required because filtering all local ENIs with a static rule is not possible yet. This will be resolved once we switch to BPF based masquerading. Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Both are needed to perform CRD metric accounting for the IPAM CRD plugin. Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
|
CI passed, rebasing to merge. |
5a3a9f9 to
7f45853
Compare
This PR adds AWS ENI IP allocation and datapath support. The details can be found in the individual commit messages and in the documentation commit.
Fixes: #6430
This change is