Skip to content
This repository was archived by the owner on Oct 15, 2024. It is now read-only.
This repository was archived by the owner on Oct 15, 2024. It is now read-only.

Issue with container running as root #8

@vsoch

Description

@vsoch

Hiya! I found this repository because I'm creating an EKS cluster with eksctl, specifically like this:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: flux-cluster
  region: us-east-2
  version: "1.23"
  

availabilityZones: ["us-east-2b", "us-east-2c"]
managedNodeGroups:
  - name: workers
    instanceType: hpc6a.48xlarge
    minSize: 64
    maxSize: 64
    labels: { "fluxoperator": "true" }
    availabilityZones: ["us-east-2b"]
    efaEnabled: true
    placement:
      groupName: eks-efa-testing

And when I request a job asking for efa for my pods, e.g, (this is our operator CRD that has worked before):

# Resource limits to enable efa
resources:
    limits:
        vpc.amazonaws.com/efa: 1
        memory: "340G"
        cpu: 94

the pods are stuck in pending. Further inspection reveals:

Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  27s (x11 over 13m)  default-scheduler  0/64 nodes are available: 64 Insufficient vpc.amazonaws.com/efa.

And then I realized I could look at the logs of the pod that is supposed to provide the efa (which is where I found the container name / config that is provided in the manifest folder of this repo) and I saw:

$ kubectl describe pods -n kube-system aws-efa-k8s-device-plugin-daemonset-zpg2s
Name:                 aws-efa-k8s-device-plugin-daemonset-zpg2s
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      default
Node:                 ip-192-168-31-140.us-east-2.compute.internal/192.168.31.140
Start Time:           Mon, 30 Jan 2023 17:29:25 -0700
Labels:               controller-revision-hash=5cd48c4575
                      name=aws-efa-k8s-device-plugin
                      pod-template-generation=1
Annotations:          kubernetes.io/psp: eks.privileged
                      scheduler.alpha.kubernetes.io/critical-pod: 
Status:               Pending
IP:                   192.168.31.140
IPs:
  IP:           192.168.31.140
Controlled By:  DaemonSet/aws-efa-k8s-device-plugin-daemonset
Containers:
  aws-efa-k8s-device-plugin:
    Container ID:   
    Image:          602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CreateContainerConfigError
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/kubelet/device-plugins from device-plugin (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m82qz (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  device-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:  
  kube-api-access-m82qz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             aws.amazon.com/efa:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  66m                   default-scheduler  Successfully assigned kube-system/aws-efa-k8s-device-plugin-daemonset-zpg2s to ip-192-168-31-140.us-east-2.compute.internal
  Normal   Pulling    66m                   kubelet            Pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3"
  Normal   Pulled     66m                   kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" in 4.231578378s
  Warning  Failed     64m (x12 over 66m)    kubelet            Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)
  Normal   Pulled     115s (x303 over 66m)  kubelet            Container image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" already present on machine

Specifically notice the second to last line - there is an error about "runAsNonRoot"

  Warning  Failed     64m (x12 over 66m)    kubelet            Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)

I am thinking this might be related to eksctl, if it's creating / submitting this yaml, but since I found what appears to be the same efa container here, I thought I would ask! Is there perhaps a spot fix I could do, re-applying this yaml to ask to run as root? 🤔

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions