Skip to content

fix(eks): can't deploy with Bottlerocket amiType #17775

Merged
mergify[bot] merged 26 commits intoaws:masterfrom
pahud:pahud/eks-unable-to-use-bottlerocket-17641
Dec 17, 2021
Merged

fix(eks): can't deploy with Bottlerocket amiType #17775
mergify[bot] merged 26 commits intoaws:masterfrom
pahud:pahud/eks-unable-to-use-bottlerocket-17641

Conversation

@pahud
Copy link
Copy Markdown
Contributor

@pahud pahud commented Nov 30, 2021

As Nodegroup now supports different AMI types including Bottlerocket for both x86_64 and ARM_64, we cannot determine correct amiType simply from the instanceTypes property(#17641 ). However, when instanceTypes are provided, we still can check:

  1. if instance types of different CPU architectures are mxied and throw an error
  2. if the provided amyType compatible with the instanceTypes

If user opt in Bottlerocket or any other AMI types other than AL2, users have to specify the amiType explicitly. If it's unspecified, we will use AL2 implicitly to avoid breaking changes, which is the default behavior previously.

The only case amiType has to be undefined is that when custom AMI is defined in the launch template. As we can't check this case, users have to explicitly leave it undefined. We add a notice in the property doc string for this.

Related to #12441
Fixes: #17641


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@gitpod-io
Copy link
Copy Markdown

gitpod-io bot commented Nov 30, 2021

@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Nov 30, 2021
@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Nov 30, 2021

Hey @neilkuan I'll need your initial review. Let's discuss this offline.

throw new Error(`The specified AMI does not match the instance types architecture, either specify ${expectedAmiType} or dont specify any`);
/**
* if the user explicitly configured instance types, we can't caculate the expected ami type as we support
* Amazon Linux 2 and Bottlerocket now. However we can check:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do the opposite? Take the AMI type as a starting point (if it's provided) and figure out which instance types are valid for it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like good, wdyt @pahud.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@otaviomacedo Yes this is exactly what I am working on.

I introduced possibleAmiTypes in this PR and calculate from the provided instance types.

Let's say if you provide all x86 instance types the possibleAmiTypes would only be AL2_X86_64 and BOTTLEROCKET_X86_64. In this case if amiType is provided we can check if it's included in the possibleAmiTypes.

@neilkuan
Copy link
Copy Markdown
Contributor

neilkuan commented Dec 2, 2021

for testing I know about if I not define amitype default will be AL2_x86_64.

so if I define instanceTypes to ['t4g.medium'],I will get error message from cloudformation.

[t4g.medium] is not a valid instance type for requested amiType AL2_x86_64 (Service: AmazonEKS; Status Code: 400; Error Code: InvalidParameterException; Request ID: 51f86ddf-0218-42c4-86d7-b02ffbe18f76; Proxy: null)

telegram-cloud-photo-size-5-6143385765976059443-y

Copy link
Copy Markdown
Contributor

@kirintwn kirintwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some NITS. Other than that, looks good.
Just need to be cautious that we need to calculate the recommended amiType corresponding to the instanceTypes when the user doesn't specify amiType or ami in launchTemplate. Also need to care the backward compatibility when calculating the recommended amiType for user.


Maybe add something like this in the if (instanceTypes && instanceTypes.length > 0) block:

if (!props.amiType && !props.launchTemplateSpec) {
  // calculate the recommended `amiType` corresponding to the `instanceTypes`
}

or when constructing the CfnNodegroup, use:

const resource = new CfnNodegroup(this, 'Resource', {
  ...
  amiType: props.launchTemplateSpec ? props.amiType : (props.amiType ?? possibleAmiTypes[0]),
  ...
});

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 13, 2021

Some NITS. Other than that, looks good. Just need to be cautious that we need to calculate the recommended amiType corresponding to the instanceTypes when the user doesn't specify amiType or ami in launchTemplate. Also need to care the backward compatibility when calculating the recommended amiType for user.

Maybe add something like this in the if (instanceTypes && instanceTypes.length > 0) block:

if (!props.amiType && !props.launchTemplateSpec) {
  // calculate the recommended `amiType` corresponding to the `instanceTypes`
}

or when constructing the CfnNodegroup, use:

const resource = new CfnNodegroup(this, 'Resource', {
  ...
  amiType: props.launchTemplateSpec ? props.amiType : (props.amiType ?? possibleAmiTypes[0]),
  ...
});

Thanks for the suggested changes.

Let's dive a little bit deeper here. We have some optional properties in this construct:

  1. props.instanceTypes is optional
  2. props.launchTemplateSpec is optional
  3. props.amiType is optional

When props.instanceTypes is specified, there might be some cases:

  1. amiType undefined with launchTemplateSpec undefined
    Check:
    • ensure the instanceTypes provided are compatible with the default amiType, i.e. AL2_X86_64.
    • to avoid BC, we use possibleAmiTypes[0] as the default amiType, which should be AL2.
  2. amiType undefined with launchTemplateSpec defined
    Yes it's possible to have launchTemplateSpec defined while the instance types are also defined in props.instanceTypes property.
    Check:
    • Same as above but we don't propose a default amiType. We leave props.amiType as-is.
  3. amiType defined with launchTemplateSpec undefined
  • we check if props.amiType is included in possibleAmiTypes we generate from props.instanceTypes.
  1. amiType defined with launchTemplateSpec defined
  • we still have to check if props.amiType is included in possibleAmiTypes we generate from props.instanceTypes. As we can't see if AMI is also defined in the launch template, it's user responsibility to leave props.amiType undefined if AMI is also specified in the launch template otherwise the deploy will fail.

That being said, it doesn't make difference where props.launchTemplateSpec is defined or not. The validation check is literally the same.

Now let's check the following cases:

Case 1:

instanceTypes: ['t4g.medium']
amiType: `undefined`
launchTemplateSpec: `undefined`

Previously users would have AL2_ARM_64 as the expectedAmiType. Now the possibleAmiTypes[0] would still be AL2_ARM_64. No breaking change.

Case 2:

instanceTypes: ['t4g.medium']
amiType: undefined
launchTemplateSpec: something (with custom AMI)

We just use props.amiType. It's user responsibility to leave props.amiType undefined if custom AMI is defined in the launch template.

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 13, 2021

Hi @neilkuan @kirintwn I am ready. Can you take another look?

@pahud pahud marked this pull request as ready for review December 13, 2021 17:07
Copy link
Copy Markdown
Contributor

@neilkuan neilkuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you also add testing for got this error message instanceTypes of different CPU architectures is not allowed ?!
    ex:
 this.cluster.addNodegroupCapacity('BottlerocketNG1', {
      instanceTypes: [new ec2.InstanceType('t4g.large'), new ec2.InstanceType('t2.large')],
    });
  1. Can you add the c7g in to instance-types.ts for graviton3?!
    other LGTM.

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 14, 2021

  1. Can you also add testing for got this error message instanceTypes of different CPU architectures is not allowed ?!
    ex:
 this.cluster.addNodegroupCapacity('BottlerocketNG1', {
      instanceTypes: [new ec2.InstanceType('t4g.large'), new ec2.InstanceType('t2.large')],
    });

The test is already there.
https://github.com/pahud/aws-cdk/blob/d3acc17b3fa9e9c928d05ff9e85f4f1fb508e053/packages/%40aws-cdk/aws-eks/test/nodegroup.test.ts#L728-L745

  1. Can you add the c7g in to instance-types.ts for graviton3?!

c7g is still in preview so I am not adding it in this PR.

https://aws.amazon.com/tw/ec2/instance-types/c7g/

other LGTM.

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 14, 2021

@neilkuan @kirintwn all addressed. Please take another look. Thanks.

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 14, 2021

I have revised the implementation based on your suggested changes. I tend to not modify aws-ec2 in this PR so I have a tiny isGpuInstanceType() helper function.

Yeah, I see how changing InstanceType would break existing code. But still, there is an opportunity for simplification here. Something like:

type MyArchitecture = InstanceArchitecture | 'GPU';

function getPossibleAmiTypes(instanceTypes: InstanceType[]): NodegroupAmiType[] {
  function typeToArch(instanceType: InstanceType): MyArchitecture {
    return isGpuInstanceType(instanceType) ? 'GPU' : instanceType.architecture;
  }

  const archAmiMap = new Map<MyArchitecture, NodegroupAmiType[]>([
    [InstanceArchitecture.ARM_64, arm64AmiTypes],
    [InstanceArchitecture.X86_64, x8664AmiTypes],
    ['GPU', gpuAmiTypes],
  ]);

  const architectures: Set<MyArchitecture> = new Set(instanceTypes.flatMap(typeToArch));

  if (architectures.size === 0) {
    throw new Error(`Cannot determine any ami type comptaible with instance types: ${instanceTypes.map(i => i.toString).join(',')}`);
  }

  if (architectures.size > 1) {
    throw new Error('instanceTypes of different architectures is not allowed');
  }

  return archAmiMap.get(Array.from(architectures)[0])!;
}

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 16, 2021

I have revised the implementation based on your suggested changes. I tend to not modify aws-ec2 in this PR so I have a tiny isGpuInstanceType() helper function.

Yeah, I see how changing InstanceType would break existing code. But still, there is an opportunity for simplification here. Something like:

type MyArchitecture = InstanceArchitecture | 'GPU';

function getPossibleAmiTypes(instanceTypes: InstanceType[]): NodegroupAmiType[] {
  function typeToArch(instanceType: InstanceType): MyArchitecture {
    return isGpuInstanceType(instanceType) ? 'GPU' : instanceType.architecture;
  }

  const archAmiMap = new Map<MyArchitecture, NodegroupAmiType[]>([
    [InstanceArchitecture.ARM_64, arm64AmiTypes],
    [InstanceArchitecture.X86_64, x8664AmiTypes],
    ['GPU', gpuAmiTypes],
  ]);

  const architectures: Set<MyArchitecture> = new Set(instanceTypes.flatMap(typeToArch));

  if (architectures.size === 0) {
    throw new Error(`Cannot determine any ami type comptaible with instance types: ${instanceTypes.map(i => i.toString).join(',')}`);
  }

  if (architectures.size > 1) {
    throw new Error('instanceTypes of different architectures is not allowed');
  }

  return archAmiMap.get(Array.from(architectures)[0])!;
}

@otaviomacedo This is interesting! Let me try it.

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 16, 2021

@otaviomacedo all fixed. Please take a look.

@pahud pahud requested a review from otaviomacedo December 16, 2021 01:33
@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 16, 2021

@kirintwn all resolved :-)

@kirintwn
Copy link
Copy Markdown
Contributor

kirintwn commented Dec 16, 2021

@pahud:

And is it okay to let the docstring separated from the function with this type declaration?
Will it breaks auto-generation mechanism of documentation?

/**
* This function examines the CPU architecture of every instance type and determines
* what AMI types are compatible for all of them. it either throws or produces an array of possible AMI types because
* instance types of different CPU architectures are not supported.
* @param instanceTypes The instance types
* @returns NodegroupAmiType[]
*/
type AmiArchitecture = InstanceArchitecture | 'GPU';
function getPossibleAmiTypes(instanceTypes: InstanceType[]): NodegroupAmiType[] {

can you confirm this? thanks!

@pahud
Copy link
Copy Markdown
Contributor Author

pahud commented Dec 17, 2021

can you confirm this? thanks!

@kirintwn nice catch! all fixed!

Maybe another look? copy @otaviomacedo

Copy link
Copy Markdown
Contributor

@kirintwn kirintwn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 17, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

@aws-cdk-automation
Copy link
Copy Markdown
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildProject89A8053A-LhjRyN9kxr8o
  • Commit ID: 5bda204
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mergify mergify bot merged commit b7be71c into aws:master Dec 17, 2021
@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Dec 17, 2021

Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork).

TikiTDO pushed a commit to TikiTDO/aws-cdk that referenced this pull request Feb 21, 2022
As `Nodegroup` now supports different AMI types including Bottlerocket for both x86_64 and ARM_64, we cannot determine correct amiType simply from the `instanceTypes` property(aws#17641 ). However, when `instanceTypes` are provided, we still can check: 

1. if instance types of different CPU architectures are mxied and throw an error
2. if the provided `amyType` compatible with the instanceTypes


If user opt in Bottlerocket or any other AMI types other than AL2, users have to specify the `amiType` explicitly. If it's unspecified, we will use AL2 implicitly to avoid breaking changes, which is the default behavior previously.

The only case `amiType` has to be undefined is that when custom AMI is defined in the launch template. As we can't check this case, users have to explicitly leave it undefined. We add a notice in the property doc string for this.


Related to aws#12441
Fixes: aws#17641 

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(eks): Unable to use Bottlerocket for managed nodegroup

5 participants