Skip to content

The Vault issuer can now be given a serviceAccountRef#4524

Closed
maelvls wants to merge 1 commit intocert-manager:masterfrom
maelvls:projected-sa-vault
Closed

The Vault issuer can now be given a serviceAccountRef#4524
maelvls wants to merge 1 commit intocert-manager:masterfrom
maelvls:projected-sa-vault

Conversation

@maelvls
Copy link
Copy Markdown
Member

@maelvls maelvls commented Oct 15, 2021

🌟 The solution that this PR partially implements is documented in Proposed solution: cert-manager requests a token using the Token Request API

As detailed in #4144, the Vault issuer is only able to use a Secret in order to use the "Kubernetes authentication" method. The downside to this service account Secret token is that it has the default JWT is "kubernetes/serviceaccount" (along with the fact that the token is not bound to a particular pod and has no expiry).

With the new serviceAccountRef field, cert-manager now requests the token on behalf of the pod in order to authenticate with Vault.

I manually tested the feature on these environments:

  • kind cluster (Kubernetes 1.21) with an internal Vault running as a deployment.
  • EKS cluster (Kubernetes 1.20) with an internal Vault running as a deployment.

Still to be done:

Notes to the reviewer:

  1. No token rotation is done here, which might become a problem since a TokenRequest API call to the apiserver is made every time the Vault client is created, which happens every time a resync of a CertificateRequest referencing a Vault issuer.
  2. This approach requires cert-manager to be given even more permissions. The user now has to create an RBAC Role and RoleBinding for the service account they want to use.
  3. The expiry is currently hardcoded to 2 hours. It is configurable now.
    3.~ The audience is currently hardcoded to vault.~ It is configurable now.
  4. I had to change the vault.auth.kubernetes.secretRef from mandatory to optional; I don't think that's a breaking change since I am relaxing constraints.

UPDATE: another idea, suggested in #2345, is to reuse cert-manager's own token to do the authentication. I don't know what are the security implications of this yet.

If you want to test this feature, here are the steps:

./devel/addon/certmanager/install.sh
helm upgrade --install vault hashicorp/vault --set server.dev.enabled=true --set global.tlsDisable=true --wait
kubectl wait --for=condition=Ready pod vault-0

kubectl exec vault-0 -i -- vault secrets enable pki
kubectl exec vault-0 -i -- vault secrets tune -max-lease-ttl=175200h pki
kubectl exec vault-0 -i -- vault write pki/root/generate/internal common_name=example.com key_type=ec key_bits=256 ttl=175200h
kubectl exec vault-0 -i -- vault write pki/config/urls \
    issuing_certificates="http://vault.default:8200/v1/pki/ca" \
    url_distribution_points="http://vault.default:8200/v1/pki/crl"
kubectl exec vault-0 -i -- vault write pki/roles/vault-issuer allowed_domains=cluster.local allow_subdomains=true max_ttl=48h key_type=ec key_bits=256

kubectl exec vault-0 -i -- vault auth enable kubernetes
kubectl exec vault-0 -i -- sh -c 'vault write auth/kubernetes/config \
   token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
   kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
   kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
   issuer=https://kubernetes.default.svc.cluster.local'

kubectl exec vault-0 -i -- vault policy write vault-issuer - <<EOF
path "pki*"                 { capabilities = ["read", "list"] }
path "pki/roles/vault-issuer"   { capabilities = ["create", "update"] }
path "pki/sign/vault-issuer"    { capabilities = ["create", "update"] }
path "pki/issue/vault-issuer"   { capabilities = ["create"] }
EOF
kubectl exec vault-0 -i -- vault write auth/kubernetes/role/vault-issuer bound_service_account_names=vault-issuer bound_service_account_namespaces=default policies=vault-issuer ttl=20m

Then, create an issuer and a certificate:

kubectl apply -f- <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: vault-issuer
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: vault-issuer
  namespace: default
rules:
  - apiGroups: ['']
    resources: ['serviceaccounts/token']
    resourceNames: ['vault-issuer']
    verbs: ['create']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: vault-issuer
  namespace: default
subjects:
  - kind: ServiceAccount
    name: cert-manager
    namespace: cert-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: vault-issuer
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: vault-issuer
  namespace: default
spec:
  vault:
    path: pki/sign/vault-issuer
    server: http://vault.default.svc.cluster.local:8200
    auth:
      kubernetes:
        role: vault-issuer
        mountPath: /v1/auth/kubernetes
        serviceAccountRef:
          name: vault-issuer # ✨
          expirationSeconds: 600 # ✨
          audience: vault # ✨
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
spec:
  secretName: example-com-tls
  issuerRef:
    name: vault-issuer
  commonName: example.cluster.local
  dnsNames:
  - example.cluster.local
  privateKey:
    algorithm: ECDSA
EOF

If you want to run the new e2e locally:

make -j e2e-setup-certmanager e2e-setup-vault
make e2e GINKGO_FOCUS=".*should be ready with a valid serviceAccountRef"

/kind feature
/area vault

The Vault issuer can now be used with projected tokens instead of a service account Secret.

@jetstack-bot
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@jetstack-bot jetstack-bot added kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/vault Indicates a PR directly modifies the Vault Issuer code size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/api Indicates a PR directly modifies the 'pkg/apis' directory area/deploy Indicates a PR modifies deployment configuration labels Oct 15, 2021
@jetstack-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maelvls

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 15, 2021
@maelvls maelvls force-pushed the projected-sa-vault branch 2 times, most recently from 2b33c21 to 99fee0d Compare October 15, 2021 12:34
@jetstack-bot jetstack-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 22, 2021
@maelvls
Copy link
Copy Markdown
Member Author

maelvls commented Oct 28, 2021

I have tested this PR with EKS 1.20 (using eksctl). It works as expected.

verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["serviceaccounts/token"]
verbs: ["create"]
Copy link
Copy Markdown
Member Author

@maelvls maelvls Oct 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: I should document why this is required; and most importantly that it is only required when using Vault + Kubernetes auth + serviceAccountRef.

Comment on lines +105 to +110
if kubeAuth != nil && (len(kubeAuth.SecretRef.Name) == 0 && len(kubeAuth.ServiceAccountRef.Name) == 0) {
logf.V(logf.WarnLevel).Infof("%s: %s", v.issuer.GetObjectMeta().Name, messageKubeAuthSingleRequired)
apiutil.SetIssuerCondition(v.issuer, v.issuer.GetGeneration(), v1.IssuerConditionReady, cmmeta.ConditionFalse, errorVault, messageKubeAuthSingleRequired)
return nil
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review: would it be worth adding these "runtime" constraints to the webhook validation too?

@krx252525
Copy link
Copy Markdown

We just hit this in one of our dev environments. What's the likely timeline for a release assuming this solution is accepted (I see some alternatives being mentioned)?

@maelvls
Copy link
Copy Markdown
Member Author

maelvls commented Nov 3, 2021

Hi. If this solution is accepted (it seems like the discussion is going towards implementing it with serviceAccountRef), I plan on getting this PR merged for 1.7, but you can expect a 1.7.0-alpha.0 to test the feature probably next week or the week after.

@maelvls
Copy link
Copy Markdown
Member Author

maelvls commented Nov 5, 2021

⚠️ I paused this PR due to the security concerns with regards to the RBAC rules required by this PR

I feel more and more uncomfortable with the idea of giving even more privileges to cert-manager. More specifically, this feature requires adding to everybody's cert-manager a "god-like" role:

- apiGroups: [""]
  resources: [serviceaccounts/token]
  verbs: [create]

With this new role, cert-manager is able to make requests to the apiserver using any service account that exists across all namespaces. If one of them has more privileges than cert-manager's own service account, it is a privilege escalation.

I have looked around to see if anyone else is using this role and the TokenRequest API, and whether there is a way to "minimize" the impact of this new role.

It seems like only cloud providers controllers, such as cloud-provider-aws, have "seriously" been using this feature.

cloud-provider-aws is only able to request tokens for three service accounts using resourceNames:

- apiGroups:
  - ""
  resourceNames:
  - node-controller
  - service-controller
  - route-controller
  resources:
  - serviceaccounts/token
  verbs:
  - create

The strings node-controller, service-controller and route-controller are
"controller roles" as described in Roles for built-in controllers. These are service accounts are visible in the kube-system namespace.

Previously, the Vault issuer was only able to use a Secret in order to
use the "Kubernetes authentication" method. The downside to this service
account Secret token is that it has the default JWT iss
"kubernetes/serviceaccount" (along with the fact that the token is not
bound to a particular pod and has no expiry).

With the new serviceAccountRef, cert-manager now requests the token on
behalf of the pod in order to authenticate with Vault.

Signed-off-by: Maël Valais <mael@vls.dev>
@maelvls maelvls force-pushed the projected-sa-vault branch from 99fee0d to ddfdf4d Compare November 5, 2021 13:35
@jetstack-bot jetstack-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2021
@Tolsto
Copy link
Copy Markdown
Contributor

Tolsto commented Nov 8, 2021

I would also prefer the approach outlined in #2345. We currently use Banzai Bank-Vaults to inject Vault secrets into the pod's ENV which also makes requests to Vault using the pod's service account token.

@maelvls
Copy link
Copy Markdown
Member Author

maelvls commented Dec 15, 2021

Hi,

After a discussion on Slack, due to the high level of privileges required for the serviceAccountRef to be implemented this way, I decided to close this PR and try to think of a better way.

On a positive note, #2345 suggests re-using the existing token mounted to the cert-manager pod, which seems a much better approach. The only issue is that when we hand Vault the token, it will have way too many permissions (such as reading secrets). Vault does not require any permission to call the the tokenreview API.

@maelvls maelvls closed this Dec 15, 2021
@SpectralHiss
Copy link
Copy Markdown
Contributor

SpectralHiss commented Mar 18, 2022

As we can see the implementation for this particular feature presented multiple security challenges, that are currently reflected in https://hackmd.io/MNQf9ReSR6GdkFl04oS-fA

However, It seems that even the Vault team is actually advising to set "disable_iss_validation=true" (or leave blank for 1.9+ as it's default) on the authentication engine:
This would be the most straightforward fix and the threat modelling from Vault seems to indicate this is a "double" validation anyways.
So to summarise there are 2 main options at present:

  • Disable iss validation on Kubernetes engine as Vault post suggests.
  • Have two different Kubernetes Auth engines with different iss validation values and bind service accounts accordingly, this is more secure but add a configuration burden.

If you're actually already on Vault 1.9 the iss issuer validation, which is causing this whole issue is being deprecated.
this means installing for example on EKS 1.21 with Vault 1.9+ without setting the issuer for the Kubernetes engine is now recommended (because of this exact issue) as first reported here, thanks to @avoidik for the heads' up!

So that first workaround of disabling the disable_iss_validation=true for (1.9 or below) and leaving blank from 1.9+ is now the most sensible approach for Kubernetes vault auth engine, which sidesteps the issue.

Aside from issuer validation, we would want to natively support bounded service accounts as it adds an extra layer of security but we can now say conclusively that this is not a blocker to cert-manager vault issuer setup in Kubernetes 1.21+.
Thus I think we can safely consider this issue closed (and not paused) and open a separate one for bounded service account support for issuers in general in cert-manager.

tomhjp added a commit to tomhjp/cert-manager that referenced this pull request Jul 4, 2022
@maelvls
Copy link
Copy Markdown
Member Author

maelvls commented Jan 11, 2023

Note: I re-created this PR in #5502.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates a PR directly modifies the 'pkg/apis' directory area/deploy Indicates a PR modifies deployment configuration area/vault Indicates a PR directly modifies the Vault Issuer code dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants