-
Notifications
You must be signed in to change notification settings - Fork 20
Issue with container running as root #8
Description
Hiya! I found this repository because I'm creating an EKS cluster with eksctl, specifically like this:
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: flux-cluster
region: us-east-2
version: "1.23"
availabilityZones: ["us-east-2b", "us-east-2c"]
managedNodeGroups:
- name: workers
instanceType: hpc6a.48xlarge
minSize: 64
maxSize: 64
labels: { "fluxoperator": "true" }
availabilityZones: ["us-east-2b"]
efaEnabled: true
placement:
groupName: eks-efa-testingAnd when I request a job asking for efa for my pods, e.g, (this is our operator CRD that has worked before):
# Resource limits to enable efa
resources:
limits:
vpc.amazonaws.com/efa: 1
memory: "340G"
cpu: 94the pods are stuck in pending. Further inspection reveals:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 27s (x11 over 13m) default-scheduler 0/64 nodes are available: 64 Insufficient vpc.amazonaws.com/efa.And then I realized I could look at the logs of the pod that is supposed to provide the efa (which is where I found the container name / config that is provided in the manifest folder of this repo) and I saw:
$ kubectl describe pods -n kube-system aws-efa-k8s-device-plugin-daemonset-zpg2s
Name: aws-efa-k8s-device-plugin-daemonset-zpg2s
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: default
Node: ip-192-168-31-140.us-east-2.compute.internal/192.168.31.140
Start Time: Mon, 30 Jan 2023 17:29:25 -0700
Labels: controller-revision-hash=5cd48c4575
name=aws-efa-k8s-device-plugin
pod-template-generation=1
Annotations: kubernetes.io/psp: eks.privileged
scheduler.alpha.kubernetes.io/critical-pod:
Status: Pending
IP: 192.168.31.140
IPs:
IP: 192.168.31.140
Controlled By: DaemonSet/aws-efa-k8s-device-plugin-daemonset
Containers:
aws-efa-k8s-device-plugin:
Container ID:
Image: 602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: CreateContainerConfigError
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/kubelet/device-plugins from device-plugin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-m82qz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
device-plugin:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
kube-api-access-m82qz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
aws.amazon.com/efa:NoSchedule op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 66m default-scheduler Successfully assigned kube-system/aws-efa-k8s-device-plugin-daemonset-zpg2s to ip-192-168-31-140.us-east-2.compute.internal
Normal Pulling 66m kubelet Pulling image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3"
Normal Pulled 66m kubelet Successfully pulled image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" in 4.231578378s
Warning Failed 64m (x12 over 66m) kubelet Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)
Normal Pulled 115s (x303 over 66m) kubelet Container image "602401143452.dkr.ecr.us-east-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.3.3" already present on machine
Specifically notice the second to last line - there is an error about "runAsNonRoot"
Warning Failed 64m (x12 over 66m) kubelet Error: container has runAsNonRoot and image will run as root (pod: "aws-efa-k8s-device-plugin-daemonset-zpg2s_kube-system(1b46d2ac-c922-449b-b630-bab344976d9f)", container: aws-efa-k8s-device-plugin)
I am thinking this might be related to eksctl, if it's creating / submitting this yaml, but since I found what appears to be the same efa container here, I thought I would ask! Is there perhaps a spot fix I could do, re-applying this yaml to ask to run as root? 🤔