Skip to content

[WIP] Amazon Linux 2023 proof-of-concept#1212

Closed
cartermckinnon wants to merge 3 commits intomasterfrom
al2023-poc
Closed

[WIP] Amazon Linux 2023 proof-of-concept#1212
cartermckinnon wants to merge 3 commits intomasterfrom
al2023-poc

Conversation

@cartermckinnon
Copy link
Copy Markdown
Contributor

@cartermckinnon cartermckinnon commented Mar 4, 2023

Description of changes:

This is an initial PoC for an AL2023-based EKS worker AMI.

Changelog:

  • rsa SSH keys not supported by default on AL2023, switched to ed25519.
  • Removed Docker from template variables.
  • Removed upgrade_kernel.sh provisioner -- AL2023 is on kernel 6.1 by default.
  • Removed curl in favor of curl-minimal.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@stewartsmith
Copy link
Copy Markdown

What protocol do you need that is not provided by curl-minimal ?

Comment thread eks-worker-al2023-variables.json Outdated
"runc_version": "*",
"security_group_id": "",
"sonobuoy_e2e_registry": "",
"source_ami_filter_name": "al2023-ami-minimal-2023.0.*-kernel-6.1-x86_64",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the SSM parameter?

this will break as soon as 2023.1 is released 3 months after GA.

Copy link
Copy Markdown

@stewartsmith stewartsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional things that should change for AL2023

  • AWS CLI v2 does not need to be pulled from external locations, the packaged aws-cli is version 2.
  • I'm not convinced the sshd configuration is needed / doing something intentional here? Is there something that is not okay with teh AL2023 default configs?
  • why is the clocksource being switched on xen? Is there good information on this? A ticket?
  • Why is there a need to have curl rather than curl-minimal which is there already? Are you downloading containers via gopher or something?
  • yum-utils is replaced by dnf-utils - what specifically do you need from there though?
  • instead of yum-plugin-versionlock the modern way is to install dnf-command(versionlock)
  • is nfs-utils what you're looking for? Or is it really the EFS utilities?
  • why do you need to install wget at all? What is the need for wget and curl ? It looks like the only use here is to download binaries from S3 and github? curl can do that and reduce the patching burden on customers.
  • the "Update the OS to begin with to catch up to the latest packages" invocation will not do this on AL2023. It would be better to have the build run off of an explicit AMI name, and have the option to have dnf --releasever=$RELEASE_TO_UPGRADE_TO update -y in the build to help with anything emergent.
  • are you sure the ec2-net-utils removal is still needed? Network setup works rather differently in AL2023. It also is likely worth changing the configuration rather than removing.
  • Do you need to explicitly version lock the individual packages when you're already going to be version locked to a specific version of the OS?
  • Why do you need device mapper and LVM by default?
  • the amazon-linux-extras enable docker is not going to work on AL2023
  • Are you sure the groupadd and useradd are required?
  • The "Enable docker daemon to start on boot" command of "sudo systemctl daemon-reload" is.... not how that works.
  • If you are setting up logrotate, then you're going to have to install logrotate. But it appears that instead things are in fact using journald, in which case, don't configure logrotate.
  • There's going to be a better name than /etc/sysctl.d/99-amazon.conf for the sysctl settings. Are you sure these changes are still needed on AL2023? Overcommit is going to be always enabled because otherwise it's just too funny, and we do panic on oops.
  • The inotify and vm.max_map_count sysctl settings should be in /etc/sysctl.d files rather than adding to /etc/sysctl.conf
  • I am almost certain the log-collector-script does not belong in /etc
  • I am pretty sure there's a better way to disable package upgrade on instance launch than running sed over /etc/cloud/cloud.cfg - especially as this is something we do not do on AL2023.
  • chkconfig is not what should be being used for anything, and is not installed by default
  • are you sure you need to configure chrony ?

@cartermckinnon
Copy link
Copy Markdown
Contributor Author

Thanks for combing through this, @stewartsmith! This is a very rough draft I threw together to see how far the existing template was from a successful build with an AL2023 base. I'll follow up on the items one-by-one, but I expect to remove many oddities that have accumulated over time.

@stevehipwell
Copy link
Copy Markdown
Contributor

@cartermckinnon would it be better if the AL2023 version was a new repo? Given the new AL release cadence it seems likely that multiple AL versions will be relevant at a given time. By changing repos it'd also allow for tooling changes without impacting the previous version.

@DekusDenial
Copy link
Copy Markdown

So AL2023 is GA https://aws.amazon.com/blogs/aws/amazon-linux-2023-a-cloud-optimized-linux-distribution-with-long-term-support/ are we expecting to see more effort putting into this as EKS ami?

@ozbenh
Copy link
Copy Markdown

ozbenh commented Mar 16, 2023

Note: For installing curl (or gnupg) on top of the -minimal variant, the best approach is to use dnf swap, for example:

sudo dnf swap curl-minimal curl

@cartermckinnon cartermckinnon changed the title Amazon Linux 2023 proof-of-concept [WIP] Amazon Linux 2023 proof-of-concept Apr 17, 2023
@dims dims mentioned this pull request Apr 28, 2023
@glachac-safelishare
Copy link
Copy Markdown

@cartermckinnon

Have you tried this and had it work? I had to add iptables-legacy to get bootstrap.sh to run. However coredns continually errors out and restarts. There is no pod-to-pod networking that works - and I can't seem to figure out why.

[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [INFO] plugin/kubernetes: waiting for Kubernetes API before starting server                                                                                                                                                                                           │
│ [WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API                                                                                                                                                                                             │
│ .:53                                                                                                                                                                                                                                                                  │
│ [INFO] plugin/reload: Running configuration SHA512 = 74129e6fb7c506b394c384adbdec8b507dec173739e054b346299b79b229f2ce2342b444c4298b2d6c85b6d7302ee317ae1fd6ea3eff84c03129413ddcdda3ba                                                                                 │
│ CoreDNS-1.9.3                                                                                                                                                                                                                                                         │
│ linux/amd64, go1.19.5, b8f104d1                                                                                                                                                                                                                                       │
│ [INFO] 127.0.0.1:35475 - 57502 "HINFO IN 6847426682292060063.1664824736373062434. udp 57 false 512" - - 0 6.001588443s                                                                                                                                                │
│ [ERROR] plugin/errors: 2 6847426682292060063.1664824736373062434. HINFO: read udp 192.168.21.51:38767->192.168.0.2:53: i/o timeout                                                                                                                                    │
│ [INFO] 127.0.0.1:35034 - 21288 "HINFO IN 6847426682292060063.1664824736373062434. udp 57 false 512" - - 0 6.001696281s                                                                                                                                                │
│ [ERROR] plugin/errors: 2 6847426682292060063.1664824736373062434. HINFO: read udp 192.168.21.51:37140->192.168.0.2:53: i/o timeout                                                                                                                                    │
│ [INFO] 127.0.0.1:50233 - 61768 "HINFO IN 6847426682292060063.1664824736373062434. udp 57 false 512" - - 0 2.000631216s                                                                                                                                                │
│ [ERROR] plugin/errors: 2 6847426682292060063.1664824736373062434. HINFO: read udp 192.168.21.51:48565->192.168.0.2:53: i/o timeout

The related errors seem to be a CNI problem.

May 17 19:10:22 ip-192-168-20-231.us-east-2.compute.internal kubelet[2326]: E0517 19:10:22.972813    2326 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
May 17 19:10:28 ip-192-168-20-231.us-east-2.compute.internal kubelet[2326]: E0517 19:10:28.226863    2326 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
May 17 19:10:33 ip-192-168-20-231.us-east-2.compute.internal kubelet[2326]: E0517 19:10:33.227694    2326 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
May 17 19:10:38 ip-192-168-20-231.us-east-2.compute.internal kubelet[2326]: E0517 19:10:38.228454    2326 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
May 17 19:10:43 ip-192-168-20-231.us-east-2.compute.internal kubelet[2326]: E0517 19:10:43.228781    2326 kubelet.go:2376] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"

@dims
Copy link
Copy Markdown
Contributor

dims commented Jun 12, 2023

@cartermckinnon a few more things here:
master...dims:amazon-eks-ami:al2023-poc-dims

(also rebased to master!)

Only thing i had to do was to add the role to the configmap as it was not getting added for some reason:
https://karpenter.sh/v0.26/troubleshooting/#:~:text=CSINode%20publishing%3A%20Unauthorized-,Check%20the%20ConfigMap,-to%20check%20whether
(This was me being impatient and ctrl-c'ed the eksctl command before it could add things into aws-auth config map)

snippet from kubectl get nodes -o yaml shows:

    nodeInfo:
      architecture: amd64
      bootID: 1d95ba81-e95c-4073-8a77-a2351489ee07
      containerRuntimeVersion: containerd://1.6.19
      kernelVersion: 6.1.29-47.49.amzn2023.x86_64
      kubeProxyVersion: v1.27.1-eks-2f008fe
      kubeletVersion: v1.27.1-eks-2f008fe
      machineID: ec220096a17ae2fd59153f731a6b9346
      operatingSystem: linux
      osImage: Amazon Linux 2023
      systemUUID: ec220096-a17a-e2fd-5915-3f731a6b9346

PS: eksctl config is here

@cartermckinnon
Copy link
Copy Markdown
Contributor Author

Closing this PR in favor of #1340.

@cartermckinnon cartermckinnon deleted the al2023-poc branch July 6, 2023 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants