Skip to content

hostUsers: false and VOLUME [ "/run" ] lead to create mountpoint for /var/run/secrets/kubernetes.io/serviceaccount mount: mkdirat /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/test-volume/rootfs/run/secrets: permission denied #11852

@adelton

Description

@adelton

Description

When an image defines VOLUME [ "/run" ], attempt to run a user-namespaced container with hostUsers: false via K3s fails with message

Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/9587ef35-67eb-4fcb-9b57-a28fc23bd1fb/volumes/kubernetes.io~projected/kube-api-access-94hrt" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount": create mountpoint for /var/run/secrets/kubernetes.io/serviceaccount mount: mkdirat /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/test-volume/rootfs/run/secrets: permission denied

It is necessary to specify the /run as an emptyDir volume mount.

The problem is present with the containerd and runc shipped in stock K3c (v2.0.4-k3s2). Also reproduced with containerd 2.1 and runc 1.3.0 and crun 1.21.

Steps to reproduce the issue

  1. Have an Ubuntu 24.04 machine (VM).
  2. export INSTALL_K3S_EXEC="--kubelet-arg feature-gates=UserNamespacesSupport=true --kube-apiserver-arg feature-gates=UserNamespacesSupport=true --kube-controller-manager-arg feature-gates=UserNamespacesSupport=true --kube-scheduler-arg feature-gates=UserNamespacesSupport=true"
  3. curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig ~/.kube/config
  4. sudo chown -R $( id -u ):$( id -g ) ~/.kube
  5. sudo apt update ; sudo apt install -y buildah
  6. Have a Dockerfile with
FROM docker.io/library/alpine:latest
VOLUME [ "/run" ]
  1. buildah build -t docker-archive:/tmp/test-volume.tar:localhost/test-volume
  2. sudo k3s ctr images import - < /tmp/test-volume.tar
  3. Have a test-volume.yaml with
apiVersion: v1
kind: Pod
metadata:
  name: test-volume
spec:
  restartPolicy: Never
  hostUsers: false
  containers:
  - name: test-volume
    image: localhost/test-volume
    imagePullPolicy: Never
    command: [ "cat", "/proc/self/uid_map" ]
  1. kubectl apply -f test-volume.yaml
  2. kubectl logs test-volume
  3. kubectl describe pod/test-volume | tail -2

Describe the results you received and expected

Expected:

Something like

         0  383975424      65536

indicating that the container is running user-namespaced, and

  Normal  Created    5s    kubelet            Created container: test-volume
  Normal  Started    5s    kubelet            Started container test-volume

with no error.

Actual:

No output from kubectl logs test-volume. The describe showing

  Normal   Created    5s    kubelet            Created container: test-volume
  Warning  Failed     5s    kubelet            Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/9587ef35-67eb-4fcb-9b57-a28fc23bd1fb/volumes/kubernetes.io~projected/kube-api-access-94hrt" to rootfs at "/var/run/secrets/kubernetes.io/serviceaccount": create mountpoint for /var/run/secrets/kubernetes.io/serviceaccount mount: mkdirat /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/test-volume/rootfs/run/secrets: permission denied

What version of containerd are you using?

containerd github.com/containerd/containerd/v2 v2.1.0 061792f

Any other relevant information

I originally filed this issue in K3s as k3s-io/k3s#12332 and was suggested to file with containerd. So this is essentially a copy of that issue.

I am able to make things work by adding

    volumeMounts:
    - mountPath: /run
      name: run-volume
  volumes:
  - name: run-volume
    emptyDir: {}

to test-volume.yaml.

But I would expect that VOLUME [ "/run" ] to do effectively the same, and user-namespaced containers not choking on that image.

This is a minimized example of an issue we've seen with https://github.com/freeipa/freeipa-container and K3s and containerd. The reason why we have VOLUME [ "/run" ] in the image is to make the usage of that systemd-based image easier with

    securityContext:
      readOnlyRootFilesystem: true

Show configuration if it is related to CRI plugin.

version = 3

[plugins.'io.containerd.cri.v1.runtime'.cni]
  bin_dirs = [ "/var/lib/rancher/k3s/data/cni" ]
  conf_dir = "/var/lib/rancher/k3s/agent/etc/cni/net.d"

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
  cgroup_writable = true

[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
  SystemdCgroup = true

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions