Skip to content

fix(eks): nodegroup synthesis fails when configured with an AMI type that is not compatible to the default instance type#12441

Merged
mergify[bot] merged 15 commits intomasterfrom
epolon/eks-nodegroup-default-instance-types
Jan 12, 2021
Merged

fix(eks): nodegroup synthesis fails when configured with an AMI type that is not compatible to the default instance type#12441
mergify[bot] merged 15 commits intomasterfrom
epolon/eks-nodegroup-default-instance-types

Conversation

@iliapolo
Copy link
Copy Markdown
Contributor

@iliapolo iliapolo commented Jan 10, 2021

Note: both issues here were introduced in #11962

Problem 1

When creating a Nodegroup without passing instance types, we currently default to use t3.medium:

const instanceTypes = props.instanceTypes ?? (props.instanceType ? [props.instanceType] : Nodegroup.DEFAULT_INSTANCE_TYPES);

This default is then used to calculate the expected AMI type, and assert that the configured AMI type is indeed as expected:

if (props.amiType && props.amiType !== determinedAmiType) {
throw new Error(`The specified AMI does not match the instance types architecture, either specify ${determinedAmiType} or dont specify any`);
}

However, a user might configure instance types on the launch template, and an AMI type on the nodegroup. In this scenario, we still use the default instance type to perform the validation, which will fail if the ami type is not compatible with it.

To make things worse, we don't actually use the default instance type at all, apart from the validation:

instanceTypes: props.instanceTypes ? props.instanceTypes.map(t => t.toString()) :
props.instanceType ? [props.instanceType.toString()] : undefined,

And in-fact, this default was only introduced in this PR, which also added the problematic validation.

Solution

Drop the default instance type altogether, like it was before. The new validation will only take place if the user explicitly configured both instanceTypes and amiType on the nodegroup. Since the default value was never actually used, this doesn't incur any behavior change.

Problem 2

When a launch template is used, we currently ignore the value of amiType explicitly passed by the user:

// AmyType is not allowed by CFN when specifying an image id in your launch template.
amiType: props.launchTemplateSpec === undefined ? determinedAmiType : undefined,

This behavior means that users who configured a launch template without a custom ami, and passing an amiType to the nodegroup, would now result in no ami specification at all, defaulting to whatever EKS does, which might not be what the user had in mind.

There's no good reason to do this, we should either throw a validation error if both are used, or pass the explicit value nevertheless, even though it might cause problems.

Solution

When a user explicitly passes an AMI type, just use it and assume the user knows what he/she is doing. When a user does not explicitly pass it, only apply the default if a launch template is not used.

If we apply the default in the presence of a launch template, a user would not be able to escape if they also have a custom AMI in the launch template.

This change means that users who previously "relied" on this override, might now experience a deployment failure if they are using a custom AMI in the launch template, those users can resolve the problem by removing the amiType property from the nodegroup (since it wasn't used, its not needed). I don't imagine many such users exist since this behavior is new and it doesn't make much sense to configure both a custom AMI and an amiType.


Fixes #12389

BREAKING CHANGE: Explicitly passing amiType to nodegroups will now take affect even if a launch template is configured as well. If your launch template contains a custom AMI, this will cause a deployment failure, to resolve, remove the explicit amiType from the nodegroup configuration.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link
Copy Markdown

gitpod-io bot commented Jan 10, 2021

@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Jan 10, 2021
@mergify mergify bot added the contribution/core This is a PR that came from AWS. label Jan 10, 2021
@iliapolo iliapolo marked this pull request as ready for review January 10, 2021 12:22
@iliapolo iliapolo requested a review from eladb January 10, 2021 12:22
@iliapolo iliapolo added the pr/do-not-merge This PR should not be merged at this time. label Jan 10, 2021
@iliapolo
Copy link
Copy Markdown
Contributor Author

Not quite ready yet

@iliapolo iliapolo marked this pull request as draft January 10, 2021 12:46
@iliapolo iliapolo marked this pull request as ready for review January 10, 2021 13:33
@iliapolo iliapolo removed the pr/do-not-merge This PR should not be merged at this time. label Jan 12, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 12, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Jan 12, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@mergify mergify bot merged commit 5f6f0f9 into master Jan 12, 2021
@mergify mergify bot deleted the epolon/eks-nodegroup-default-instance-types branch January 12, 2021 13:36
@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject6AEA49D1-qxepHUsryhcu
  • Commit ID: 10dc541
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mergify bot pushed a commit that referenced this pull request Dec 17, 2021
As `Nodegroup` now supports different AMI types including Bottlerocket for both x86_64 and ARM_64, we cannot determine correct amiType simply from the `instanceTypes` property(#17641 ). However, when `instanceTypes` are provided, we still can check: 

1. if instance types of different CPU architectures are mxied and throw an error
2. if the provided `amyType` compatible with the instanceTypes


If user opt in Bottlerocket or any other AMI types other than AL2, users have to specify the `amiType` explicitly. If it's unspecified, we will use AL2 implicitly to avoid breaking changes, which is the default behavior previously.

The only case `amiType` has to be undefined is that when custom AMI is defined in the launch template. As we can't check this case, users have to explicitly leave it undefined. We add a notice in the property doc string for this.


Related to #12441
Fixes: #17641 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this pull request Feb 21, 2022
As `Nodegroup` now supports different AMI types including Bottlerocket for both x86_64 and ARM_64, we cannot determine correct amiType simply from the `instanceTypes` property(aws#17641 ). However, when `instanceTypes` are provided, we still can check: 

1. if instance types of different CPU architectures are mxied and throw an error
2. if the provided `amyType` compatible with the instanceTypes


If user opt in Bottlerocket or any other AMI types other than AL2, users have to specify the `amiType` explicitly. If it's unspecified, we will use AL2 implicitly to avoid breaking changes, which is the default behavior previously.

The only case `amiType` has to be undefined is that when custom AMI is defined in the launch template. As we can't check this case, users have to explicitly leave it undefined. We add a notice in the property doc string for this.


Related to aws#12441
Fixes: aws#17641 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service contribution/core This is a PR that came from AWS.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(eks): Managed node with LaunchTemplate failed to set amiType

3 participants