Skip to content

Spire fails to start if on recent version #40533

@madchap

Description

@madchap

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.17.5 and lower than v1.18.0

What happened?

Getting an EKS cluster up and running with Cilium from scratch, with Spire v.1.12.4 instead of the default 1.9.6 fails with

time="2025-07-15T12:07:21Z" level=info msg="DataStore closed" subsystem_name=catalog
time="2025-07-15T12:07:21Z" level=error msg="Fatal run error" error="listen unix /tmp/spire-server/private/api.sock: bind: permission denied"
time="2025-07-15T12:07:21Z" level=error msg="Server crashed" error="listen unix /tmp/spire-server/private/api.sock: bind: per ││ mission denied"

How can we reproduce the issue?

  1. Install Cilium with Spire enabled
  2. Override Spire version with 1.12.4
  3. Start

Cilium Version

1.17.5

Kernel Version

$ uname -a
Linux ip-10-0-30-117.eu-central-2.compute.internal 6.1.141-155.222.amzn2023.aarch64 #1 SMP Tue Jun 17 10:29:19 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux

Kubernetes Version

Client Version: v1.31.2
Kustomize Version: v5.4.2
Server Version: v1.32.5-eks-5d4a308

Regression

Yes

Sysdump

No response

Relevant log output

Anything else?

To save the trouble to find the shas and versions:

SERVER versions
1.10.4: sha256:4e77ca7017e5279af36cd2bc2fa9ba06a164b5bc352404a03bc4812881498ee8
1.11.3: sha256:fb2829b45b9c2b1ee47c3264de76375f03e9c0a0ae37188404b103aaeccadb46
1.12.4: sha256:34147f27066ab2be5cc10ca1d4bfd361144196467155d46c45f3519f41596e49


AGENT version
1.10.4: sha256:ecb3e00fa38d38e0166a678e4c4c1e1b3517b63a37e2a57154e7f5d5d0ee1f98
1.11.3: sha256:38b39a91441d646aba5fa518fffda30a14918f24af117b483529a271a96e5001
1.12.4: sha256:163970884fba18860cac93655dc32b6af85a5dcf2ebb7e3e119a10888eff8fcd

Released via openTofu with:

resource "helm_release" "cilium" {
  repository = "https://helm.cilium.io/"
  chart      = "cilium"
  version    = var.cilium_version

  name      = "cilium"
  namespace = "kube-system"
  # allow ample time at cluster bootstrap mainly due to ASG nodes coming up
  timeout = "900"

  set {
    name  = "cni.exclusive"
    value = "true"
  }

  set {
    name  = "enableIPv4Masquerade"
    value = "false"
  }

  set {
    name  = "routingMode"
    value = "native"
  }

  set {
    name  = "ipam.mode"
    value = "eni"
  }

  set {
    name  = "eni.enabled"
    value = "true"
  }

  set {
    name  = "eni.awsEnablePrefixDelegation"
    value = "true"
  }

  set {
    name  = "egressMasqueradeInterfaces"
    value = "ens+"
  }

  set {
    name  = "endpointRoutes.enabled"
    value = "true"
  }

  set {
    name  = "hubble.relay.enabled"
    value = "true"
  }

  set {
    name  = "hubble.ui.enabled"
    value = "true"
  }

  set {
    name  = "hubble.metrics.enabled"
    value = "{dns,drop,tcp,flow,port-distribution,icmp,httpV2:exemplars=true;labelsContext=source_ip\\,source_namespace\\,source_workload\\,destination_ip\\,destination_namespace\\,destination_workload\\,traffic_direction}"
  }

  # enables cilium agent metrics
  set {
    name  = "prometheus.enabled"
    value = "true"
  }

  # enables cilium operator metrics
  set {
    name  = "operator.prometheus.enabled"
    value = "true"
  }

  # given this, do not install kube-proxy or expect to see it
  set {
    name  = "kubeProxyReplacement"
    value = "true"
  }

  set {
    name  = "k8sServiceHost"
    value = replace(module.eks.cluster_endpoint, "https://", "")
  }

  set {
    name  = "k8sServicePort"
    value = "443"
  }

  # e2e encryption, node <-> pod, pod <-> pod
  set {
    name  = "encryption.enabled"
    value = "true"
  }

  set {
    name  = "encryption.type"
    value = "wireguard"
  }

  set {
    name  = "encryption.nodeEncryption"
    value = "true"
  }

  # mTLS (through SPIRE) - using values block for complex objects
  values = [
    yamlencode({
      authentication = {
        mutual = {
          spire = {
            install = {
              agent = {
                image = {
                  digest     = var.cilium_spire_version.agent.sha256
                  repository = "ghcr.io/spiffe/spire-agent"
                  tag        = var.cilium_spire_version.agent.version
                  pullPolicy = "IfNotPresent"
                  useDigest  = true
                }
              }
              server = {
                image = {
                  digest     = var.cilium_spire_version.server.sha256
                  repository = "ghcr.io/spiffe/spire-server"
                  tag        = var.cilium_spire_version.server.version
                  pullPolicy = "IfNotPresent"
                  useDigest  = true
                }
                # changed from the default helm value v1.9.6 to 1.10.x
                # If upgrading, you need to fiddle with EBS permissions or start over
                # See PR #3722
                podSecurityContext = {
                  runAsUser  = 1000
                  runAsGroup = 1000
                  fsGroup    = 1000
                }
              }
            }
          }
        }
      }
    })
  ]

  set {
    name  = "authentication.mutual.spire.enabled"
    value = "true"
  }

  set {
    name  = "authentication.mutual.spire.install.enabled"
    value = "true"
  }

  # node firewall
  set {
    name  = "hostFirewall.enabled"
    value = "true"
  }

  set {
    name  = "envoy.idleTimeoutDurationSeconds"
    value = "180"
  }

  set {
    name  = "operator.replicas"
    value = "3"
  }

  set {
    name  = "dnsProxy.dnsRejectResponseCode"
    value = "nameError"
  }
}

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentCilium agent related.area/servicemeshGH issues or PRs regarding servicemeshfeature/authenticationhelp-wantedYou can help! Post a detailed plan on the issue or create a PR to solve this issue.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.pinnedThese issues are not marked stale by our issue bot.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions