Skip to content

VCert's "auto-retry" feature (i.e., reset certificate if it is failed) causes a race condition in TPP, resulting in the error "unmatched key modulus" #273

@maelvls

Description

@maelvls

When using VCert v4.23.0, with TPP 22.4.1 (also tested with TPP 20.4.0), I often receive the following error message when requesting a certificate:

vcert error: your data contains problems: request doesn't match certificate: unmatched key modulus

I checked that the problem does not come from re-using the private key. I can confirm that the CSR and the issued certificate are mismatched.

The bug

Affected: TPP 22.4.1 and older, VCert 4.23.0, cert-manager 1.11.0 (just this one version), venafi-enhanced-issuer v0.2.0 and v0.3.0.

Fixed In: VCert v4.24.0, cert-manager v1.11.1 and 1.12.0, venafi-enhanced-issuer v0.3.1.

This bug systematically happens given the following circumstances:

  1. Only happens during renewal (does not happen when it is the initial enrollment).
  2. Only happens after a first renewal attempt (e.g., the CA was down).
  3. Only happens if the second renewal attempt fails (e.g., the CA was still down).

In real-world usage, that means that the workaround for "Click retry" introduced in #269 only "works" 50% of the time. This is better than before #269, since you were getting stuck with "Click retry" 100% of the time, but the error is now less descriptive.

Workaround: renew the certificate once again (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).

When this bug occurs, VCert and cert-manager will show the following message:

request doesn't match certificate: unmatched key modulus

Workaround: re-renew the certificate (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).

Problem

This unexpected behavior seems to happen when request and reset(restart=true) are called back to back. When that happens, TPP gives VCert an old certificate instead of returning a 500 error.

There seems to be a bad interaction between request and reset(restart=true). Note that the request we make in VCert are asynchronous (WorkToDoTimeout=0), and request never return 500s. Only retrieve calls may return a 500.

The following flow triggers the problem:

# Fresh certificate, CA is down.
request
reset(restart=true)
retrieve
# ❌ Returns 200 with the old certificate.

Even with a 1 second pause between requesting and resetting, the problem still occurs:

# Fresh certificate, CA is down.
request
sleep 1s
reset(restart=true)
retrieve
# ❌ Returns 200 with the old certificate.

We found that waiting for 5 seconds allows you to work around the problem. We also found that using reset(restart=false) before requesting doesn't trigger the problem.

# Fresh certificate, CA is down.
request
sleep 5s
reset(restart=true)
retrieve
# ✅ Returns 500 as expected.
# Fresh certificate, CA is down.
reset(restart=false)
request
retrieve
# ✅ The expected 500 HTTP code is returned.

Reproducing "unmatched key modulus" with vcert

First, set your .envrc:

#!/bin/bash
export TPP_URL=https://tpp.mael-valais-gcp.jetstacker.net
export TPP_USER=cert_manager
export TPP_PWD=$(lpass show -p tpp.mael-valais-gcp.jetstacker.net)
export TPP_CLIENT_ID=edit-mappings

Then, get a token:

TOKEN=$(vcert getcred -u $TPP_URL --username=$TPP_USER --password $TPP_PWD --client-id=$TPP_CLIENT_ID --scope=certificate:manage,revoke,delete --format json | tee /dev/stderr | jq -r .access_token) && export TOKEN

Then, make sure that the certificate doesn't already exist:

curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')

Then, make sure that the AD CS service (i.e., Microsoft Certification Authority) is running:

gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvc

Then, enroll a certificate. It should succeed:

vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com

Then, turn the CA off:

gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvc

Then, enroll. It should show an error:

$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com

Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:25:36 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:25:36 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:25:37 Successfully created request for app1.example.com
vCert: 2023/01/20 15:25:37 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:25:37 unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate \VED\Policy\application-team-1\app1.example.com has encountered an error while processing, Status: Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object., Stage: 500.

If you do it again, you will get the key modulus error:

$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com

Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:26:51 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:26:51 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:26:51 Successfully created request for app1.example.com
vCert: 2023/01/20 15:26:52 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:26:54 vcert error: your data contains problems: request doesn't match certificate: unmatched key modulus

Screenshot from 2023-01-24 13-55-43

Screenshot from 2023-01-24 13-53-09

vcert-request-reset-incorrect.har.zip

If you do it a third time, it will show the correct 500 error, since reset won't be called:

$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com

Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:25:36 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:25:36 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:25:37 Successfully created request for app1.example.com
vCert: 2023/01/20 15:25:37 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:25:37 unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate \VED\Policy\application-team-1\app1.example.com has encountered an error while processing, Status: Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object., Stage: 500.

If you call it a fourth time, it will show "unmatched key modulus" again, and so on and so forth.

Reproducing with curl

❌ Renewal of OK certificate. CA is down. Flow: request > reset(restart=true)

Occurence of this scenario in VCert: 100% of the time given the following circumstance:

  1. Only happens during renewal (does not happen when it is the initial enrollment).
  2. Only happens after a first renewal attempt.
  3. Only happens if the second renewal attempt fails.

There is an easy workaround: re-renewing the certificate (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).

Before running this test, I turn the CA on, issue a certificate, and turn the CA off:

curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H
 "Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvc
vcert enroll -u $TPP_URL -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvc

The actual test:

#!/bin/bash
openssl genrsa -out crt.key 2048
curl -X POST https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/request -w ' %{http_code}\n' -skS -D/dev/null -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d@- <<EOF
{
  "PolicyDN": "\\\\VED\\\\Policy\\\\application-team-1",
  "CASpecificAttributes": [{ "Name": "Origin", "Value": "curl" }],
  "Origin": "curl",
  "PKCS10": $(step certificate create app1.example.com --san app1.example.com --csr --key crt.key /dev/stdout -f | jq -R --slurp),
  "KeyAlgorithm": "RSA",
  "KeyBitSize": 2048,
  "DisableAutomaticRenewal": true,
  "CADN":"\\\\VED\\\\Policy\\\\Administration\\\\msca"
}
EOF
curl -sS -D/dev/null -skSH "Authorization: Bearer $TOKEN" -w ' %{http_code}\n' -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/reset -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Restart":true}'
while :; do curl -sS -D/dev/null -w ' %{http_code}\n' -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/retrieve -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Format": "base64", "IncludePrivateKey": false}'; sleep 1; done

For some reason, the retrieve call returns 200 OK instead of 500 Internal Server Error, and the returned certificate doesn't match the CSR. The certificate corresponds to the old certificate that was meant to be renewed:

{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com","Guid":"{eea3a1db-0602-4be3-9a88-da1be4dcc855}"} 200
{"ProcessingResetCompleted":true,"RestartCompleted":true} 200
{"Stage":-1,"Status":"Queued for renewal"} 202
{"Stage":-1,"Status":"Queued for renewal"} 202
{"CertificateData":"LS0tLS1CRUdJTi...URS0tLS0tDQo=","Filename":"app1.example.com.cer","Format":"base64"} 200

✅ Renewal of OK certificate. CA is down. Flow: request > wait 5s > reset(restart=true)

Before running this test, I turn the CA on, issue a certificate, and turn the CA off:

curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H
 "Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvc
vcert enroll -u $TPP_URL -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvc

Here is the actual test:

#!/bin/bash
openssl genrsa -out crt.key 2048
curl -X POST https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/request -w ' %{http_code}\n' -skS -D/dev/null -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d@- <<EOF
{
  "PolicyDN": "\\\\VED\\\\Policy\\\\application-team-1",
  "CASpecificAttributes": [{ "Name": "Origin", "Value": "curl" }],
  "Origin": "curl",
  "PKCS10": $(step certificate create app1.example.com --san app1.example.com --csr --key crt.key /dev/stdout -f | jq -R --slurp),
  "KeyAlgorithm": "RSA",
  "KeyBitSize": 2048,
  "DisableAutomaticRenewal": true,
  "CADN":"\\\\VED\\\\Policy\\\\Administration\\\\msca"
}
EOF
sleep 5
curl -sS -D/dev/null -skSH "Authorization: Bearer $TOKEN" -w ' %{http_code}\n' -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/reset -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Restart":true}'
while :; do curl -sS -D/dev/null -w ' %{http_code}\n' -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/retrieve -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Format": "base64", "IncludePrivateKey": false}'; sleep 1; done

It errors as expected:

{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com","Guid":"{5333ece9-50a6-474c-9c12-ecaef72868d7}"} 200
{"ProcessingResetCompleted":true,"RestartCompleted":true} 200
{"Stage":500,"Status":"Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object."} 500

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions